Skip to content

LLM Configuration Guide

The LLM layer reviews Semi findings from static detectors and produces regulatory-quality evidence text. It is optional — scanning works without it, but Semi findings remain as needs_review until triaged manually or by LLM.

BackendPrivacyQualityCostSetup
Ollama (local)Data never leaves machineGood (model-dependent)FreeInstall Ollama + pull model
Claude (Anthropic)Data sent to Anthropic APIExcellentPer-tokenSet ANTHROPIC_API_KEY
OpenAI (compatible)Data sent to providerVery goodPer-tokenSet OPENAI_API_KEY
OffN/AN/AFreeDefault
  1. Install Ollama: https://ollama.com
  2. Pull a model (minimum 8B for code review, 70B recommended):
Terminal window
ollama pull llama3.1:8b # Fast, adequate
ollama pull llama3.1:70b # Recommended for thorough review
ollama pull codellama:34b # Code-specialized alternative
  1. Run scan:
Terminal window
fleet scan --path . --llm ollama

Environment variables:

Terminal window
export FLEET_LLM_OLLAMA_URL=http://localhost:11434 # Default
export FLEET_LLM_OLLAMA_MODEL=llama3.1:70b
Terminal window
export ANTHROPIC_API_KEY=sk-ant-api03-...
fleet scan --path . --llm claude

Models (via FLEET_LLM_CLAUDE_MODEL):

  • claude-sonnet-4-6 — Default. Good balance of quality and speed.
  • claude-opus-4-6 — Deepest analysis. Best for critical assessments.

Works with OpenAI, Azure OpenAI, vLLM, Together, LM Studio, or any OpenAI-compatible endpoint.

Terminal window
export OPENAI_API_KEY=sk-...
export FLEET_LLM_OPENAI_BASE_URL=https://api.openai.com/v1 # Default
export FLEET_LLM_OPENAI_MODEL=gpt-4o
fleet scan --path . --llm openai

For Azure OpenAI:

Terminal window
export OPENAI_API_KEY=<azure-key>
export FLEET_LLM_OPENAI_BASE_URL=https://your-resource.openai.azure.com/openai/deployments/your-deployment
export FLEET_LLM_OPENAI_MODEL=gpt-4o

For local vLLM:

Terminal window
export OPENAI_API_KEY=dummy
export FLEET_LLM_OPENAI_BASE_URL=http://localhost:8000/v1
export FLEET_LLM_OPENAI_MODEL=meta-llama/Llama-3.1-70B-Instruct

Each requirement category has a specialized system prompt grounded in CRA Annex I language. Prompts are versioned (v1.0.0) for evidence traceability.

Categories with custom prompts:

  • CRYPTO — Algorithm approval, key sizes, modes of operation
  • NET — Transport security, credential handling, attack surface
  • AUTH — Password storage, session management, JWT validation
  • INPUT — Injection prevention, XSS, path traversal
  • STOR — Encryption at rest, access controls
  • LOG — Event coverage, data protection, structured format
  • UPD — Update integrity, signature verification, rollback
  • CONFIG — Secure defaults, debug mode
  • AI — Model integrity, prompt injection, data exposure

The LLM responds with structured JSON:

{
"assessment": "pass | fail | inconclusive",
"confidence": 0.92,
"evidence_text": "<regulatory-quality paragraph>",
"citations": [{ "file": "...", "line": 42, "snippet": "..." }],
"reasoning": "<chain-of-thought>",
"recommendations": ["<remediation if fail>"]
}
Terminal window
fleet scan --llm off --ci
fleet scan --llm ollama --ci
fleet scan --llm claude --ci
fleet scan --llm off --ci --api-url https://fleet.example.com

Every LLM-generated evidence record includes provenance:

{
"llm_provenance": {
"backend": "claude",
"model": "claude-sonnet-4-6",
"prompt_version": "v1.0.0",
"token_usage": { "input": 2340, "output": 856 },
"confidence": 0.92
}
}

When prompts are updated (new prompt_version), previously generated Semi evidence is considered stale and should be re-reviewed.

ConcernOllama (local)Claude / OpenAI (cloud)
Data leaves machineNoYes — sent to the provider API
Secret redaction before sendNot needed (stays local)Yes — secrets redacted to named markers
Request loggingLocal onlyProvider’s retention policy

Before any snippet is sent to a cloud backend, Fleet runs a redaction pass that replaces 17 classes of credential — Anthropic / OpenAI / AWS / GitHub / Stripe / Slack / GCP keys, JWTs, PEM private-key blocks, and credentials embedded in URLs — with named markers such as __REDACTED_aws_access_key_id_3__. The original values are restored only in the response, via a call-scoped reverse map that is never persisted, so the audit trail records that a secret was present without ever storing the secret itself.

For fully air-gapped workflows, the Ollama backend keeps everything on the machine — no code leaves the host at all.