Targets Configuration
Targets define which agent or LLM provider to evaluate. They are configured in .agentv/targets.yaml to decouple eval files from provider details.
Structure
Section titled “Structure”targets: - name: azure_base provider: azure endpoint: ${{ AZURE_OPENAI_ENDPOINT }} api_key: ${{ AZURE_OPENAI_API_KEY }} model: ${{ AZURE_DEPLOYMENT_NAME }}
- name: vscode_dev provider: vscode workspace_template: ${{ WORKSPACE_PATH }} judge_target: azure_base
- name: local_agent provider: cli command_template: 'python agent.py --prompt {PROMPT}' judge_target: azure_baseEnvironment Variables
Section titled “Environment Variables”Use ${{ VARIABLE_NAME }} syntax to reference values from your .env file:
targets: - name: my_target provider: anthropic api_key: ${{ ANTHROPIC_API_KEY }} model: ${{ ANTHROPIC_MODEL }}This keeps secrets out of version-controlled files.
Supported Providers
Section titled “Supported Providers”| Provider | Type | Description |
|---|---|---|
azure | LLM | Azure OpenAI |
anthropic | LLM | Anthropic Claude API |
gemini | LLM | Google Gemini |
claude-code | Agent | Claude Code CLI |
codex | Agent | Codex CLI |
pi-coding-agent | Agent | Pi Coding Agent |
vscode | Agent | VS Code with Copilot |
vscode-insiders | Agent | VS Code Insiders |
cli | Agent | Any CLI command |
mock | Testing | Mock provider for dry runs |
Referencing Targets in Evals
Section titled “Referencing Targets in Evals”Set the default target at the top level or override per case:
# Top-level defaultexecution: target: azure_base
tests: - id: test-1 # Uses azure_base
- id: test-2 execution: target: vscode_dev # Override for this caseJudge Target
Section titled “Judge Target”Agent targets that need LLM-based evaluation specify a judge_target — the LLM used to run LLM judge evaluators:
targets: - name: codex_target provider: codex judge_target: azure_base # LLM used for judgingWorkspace Template
Section titled “Workspace Template”For agent targets, workspace_template specifies a directory that gets copied to a temporary location before each test runs. This provides isolated, reproducible workspaces.
targets: - name: claude_code provider: claude-code workspace_template: ./workspace-templates/my-project judge_target: azure_baseWhen workspace_template is set:
- The template directory is copied to
~/.agentv/workspaces/<eval-run-id>/<test-id>/ - The
.gitdirectory is skipped during copy - Each test gets its own isolated copy
Workspace Setup/Teardown
Section titled “Workspace Setup/Teardown”Run scripts before and after each test using the workspace block. This can be defined at the suite level (applies to all tests) or per test (overrides suite-level).
workspace: template: ./workspace-templates/my-project setup: script: ["bun", "run", "setup.ts"] timeout_ms: 120000 cwd: ./scripts teardown: script: ["bun", "run", "teardown.ts"] timeout_ms: 30000| Field | Description |
|---|---|
template | Directory to copy as workspace (alternative to target-level workspace_template) |
setup | Script to run after workspace creation, before the agent runs |
teardown | Script to run after evaluation, before cleanup |
Each script config accepts:
| Field | Description |
|---|---|
script | Command array (e.g., ["bun", "run", "setup.ts"]) |
timeout_ms | Timeout in milliseconds (default: 60000 for setup, 30000 for teardown) |
cwd | Working directory (relative paths resolved against eval file directory) |
Lifecycle order: template copy → setup script → git baseline → agent runs → file changes captured → teardown script → cleanup
Error handling:
- Setup failure aborts the test with an error result
- Teardown failure is non-fatal (warning only)
Script context: Both scripts receive a JSON object on stdin with case context:
{ "workspace_path": "/home/user/.agentv/workspaces/run-123/case-01", "test_id": "case-01", "eval_run_id": "run-123", "case_input": "Fix the bug", "case_metadata": { "repo": "sympy/sympy", "base_commit": "abc123" }}Suite vs per-test: When both are defined, test-level fields replace suite-level fields. See Per-Test Workspace Config for examples.
Workspace Fingerprinting
Section titled “Workspace Fingerprinting”After setup and git baseline initialization, AgentV computes a SHA-256 fingerprint of the workspace file tree. This fingerprint is included in the evaluation result as workspaceFingerprint and can be used to verify that workspaces are reproducible across runs.
Cleanup Behavior
Section titled “Cleanup Behavior”By default:
- Success: Workspace is cleaned up automatically
- Failure: Workspace is preserved for debugging
Override with CLI flags:
--keep-workspaces: Always preserve workspaces--cleanup-workspaces: Always clean up, even on failure
cwd vs workspace_template
Section titled “cwd vs workspace_template”| Option | Use Case |
|---|---|
cwd | Run in an existing directory (shared across tests) |
workspace_template | Copy template to temp location (isolated per case) |
These options are mutually exclusive. If neither is set, the eval file’s directory is used as the working directory.