LLM Configuration Guide

Overview

The LLM layer reviews Semi findings from static detectors and produces regulatory-quality evidence text. It is optional — scanning works without it, but Semi findings remain as needs_review until triaged manually or by LLM.

Backends

Backend	Privacy	Quality	Cost	Setup
Ollama (local)	Data never leaves machine	Good (model-dependent)	Free	Install Ollama + pull model
Claude (Anthropic)	Data sent to Anthropic API	Excellent	Per-token	Set `ANTHROPIC_API_KEY`
OpenAI (compatible)	Data sent to provider	Very good	Per-token	Set `OPENAI_API_KEY`
Off	N/A	N/A	Free	Default

Ollama Setup (Local-First)

Install Ollama: https://ollama.com
Pull a model (minimum 8B for code review, 70B recommended):

ollama pull llama3.1:8b     # Fast, adequate
ollama pull llama3.1:70b    # Recommended for thorough review
ollama pull codellama:34b   # Code-specialized alternative

Run scan:

fleet scan --path . --llm ollama

Environment variables:

export FLEET_LLM_OLLAMA_URL=http://localhost:11434    # Default
export FLEET_LLM_OLLAMA_MODEL=llama3.1:70b

Claude Setup

export ANTHROPIC_API_KEY=sk-ant-api03-...
fleet scan --path . --llm claude

Models (via FLEET_LLM_CLAUDE_MODEL):

claude-sonnet-4-6 — Default. Good balance of quality and speed.
claude-opus-4-6 — Deepest analysis. Best for critical assessments.

OpenAI-Compatible Setup

Works with OpenAI, Azure OpenAI, vLLM, Together, LM Studio, or any OpenAI-compatible endpoint.

export OPENAI_API_KEY=sk-...
export FLEET_LLM_OPENAI_BASE_URL=https://api.openai.com/v1    # Default
export FLEET_LLM_OPENAI_MODEL=gpt-4o
fleet scan --path . --llm openai

For Azure OpenAI:

export OPENAI_API_KEY=<azure-key>
export FLEET_LLM_OPENAI_BASE_URL=https://your-resource.openai.azure.com/openai/deployments/your-deployment
export FLEET_LLM_OPENAI_MODEL=gpt-4o

For local vLLM:

export OPENAI_API_KEY=dummy
export FLEET_LLM_OPENAI_BASE_URL=http://localhost:8000/v1
export FLEET_LLM_OPENAI_MODEL=meta-llama/Llama-3.1-70B-Instruct

Prompt Strategy

Each requirement category has a specialized system prompt grounded in CRA Annex I language. Prompts are versioned (v1.0.0) for evidence traceability.

Categories with custom prompts:

CRYPTO — Algorithm approval, key sizes, modes of operation
NET — Transport security, credential handling, attack surface
AUTH — Password storage, session management, JWT validation
INPUT — Injection prevention, XSS, path traversal
STOR — Encryption at rest, access controls
LOG — Event coverage, data protection, structured format
UPD — Update integrity, signature verification, rollback
CONFIG — Secure defaults, debug mode
AI — Model integrity, prompt injection, data exposure

The LLM responds with structured JSON:

{
  "assessment": "pass | fail | inconclusive",
  "confidence": 0.92,
  "evidence_text": "<regulatory-quality paragraph>",
  "citations": [{ "file": "...", "line": 42, "snippet": "..." }],
  "reasoning": "<chain-of-thought>",
  "recommendations": ["<remediation if fail>"]
}

CI/CD Usage Patterns

fleet scan --llm off --ci

fleet scan --llm ollama --ci

fleet scan --llm claude --ci

fleet scan --llm off --ci --api-url https://fleet.example.com

Evidence Provenance

Every LLM-generated evidence record includes provenance:

{
  "llm_provenance": {
    "backend": "claude",
    "model": "claude-sonnet-4-6",
    "prompt_version": "v1.0.0",
    "token_usage": { "input": 2340, "output": 856 },
    "confidence": 0.92
  }
}

When prompts are updated (new prompt_version), previously generated Semi evidence is considered stale and should be re-reviewed.

Privacy Considerations

Concern	Ollama (local)	Claude / OpenAI (cloud)
Data leaves machine	No	Yes — sent to the provider API
Secret redaction before send	Not needed (stays local)	Yes — secrets redacted to named markers
Request logging	Local only	Provider’s retention policy

Before any snippet is sent to a cloud backend, Fleet runs a redaction pass that replaces 17 classes of credential — Anthropic / OpenAI / AWS / GitHub / Stripe / Slack / GCP keys, JWTs, PEM private-key blocks, and credentials embedded in URLs — with named markers such as __REDACTED_aws_access_key_id_3__. The original values are restored only in the response, via a call-scoped reverse map that is never persisted, so the audit trail records that a secret was present without ever storing the secret itself.

For fully air-gapped workflows, the Ollama backend keeps everything on the machine — no code leaves the host at all.