Tutorial: LLM-Powered Review

This tutorial sets up LLM-powered review to automatically assess needs_review findings and generate regulatory-quality evidence text.

What LLM Review Does

Without LLM review, 150 Semi findings require manual triage. With it, the LLM examines the code context and determines pass/fail with an evidence paragraph suitable for Module A documentation.

Without LLM: 55 pass, 40 fail, 57 needs_review
With LLM:    85 pass, 48 fail, 19 needs_review

The LLM upgrades many needs_review findings to pass or fail with evidence citations.

Option 1: Local Ollama (Private)

Install Ollama:

curl -fsSL https://ollama.com/install.sh | sh

Pull a model (70B recommended for code review):

ollama pull llama3.1:70b
# Or faster: ollama pull llama3.1:8b

Scan with LLM:

fleet scan --path . --llm ollama --output pretty

Option 2: Claude API (Highest Quality)

Get an Anthropic API key from https://console.anthropic.com

Set the key:

export ANTHROPIC_API_KEY=sk-ant-api03-...

Scan with Claude:

fleet scan --path . --llm claude --output pretty

Claude produces the highest quality evidence text — well-structured paragraphs with specific code citations.

Compare Evidence Quality

Here’s the same finding reviewed by different backends:

CRYPTO-01-R1: needs_review (confidence: 0.60)
Message: "SHA-256 usage detected"

No evidence text — requires manual review.

CRYPTO-01-R1: pass (confidence: 0.78)
Evidence: "The code uses SHA-256 for hashing via the sha2 crate.
This meets the requirement for state-of-the-art cryptography."

Basic evidence — correct assessment but light on detail.

CRYPTO-01-R1: pass (confidence: 0.95)
Evidence: "The product uses SHA-256 (256-bit) for all
security-relevant hashing operations via the sha2 crate
(src/auth.rs:42). The implementation uses Sha256::new()
from the RustCrypto project, a well-maintained cryptographic
library. No usage of deprecated algorithms (MD5, SHA-1) was
detected in security contexts. This satisfies CRA Annex I,
I.3(a) for state-of-the-art cryptographic mechanisms."

Detailed, regulatory-quality evidence with specific citations.

CI Usage Patterns

# PR checks: fast, no LLM (3 seconds)
fleet scan --llm off --ci

# Main branch: thorough, with Claude (30-60 seconds)
fleet scan --llm claude --ci

# Nightly: full review with detailed evidence
fleet scan --llm claude --report weekly-report.md

Evidence Provenance

Every LLM-reviewed finding includes provenance tracking:

{
  "llm_provenance": {
    "backend": "claude",
    "model": "claude-sonnet-4-6",
    "prompt_version": "v1.0.0",
    "token_usage": { "input": 2340, "output": 856 },
    "confidence": 0.95
  }
}

This ensures traceability: you can always see which model produced which evidence, at what confidence level, using which prompt version.