Agent Contract Tests¶

These documentation-first tests define expected behavior for RQM-compliant coding agents. They are written so they can later be automated.

`agent_reads_llms_first`¶

Objective: ensure agent reads /llms.txt before implementation planning.
Setup: provide integration task and access to docs.
Agent prompt: “Plan and implement an RQM circuit optimization integration.”
Expected behavior: agent explicitly confirms /llms.txt was read first and follows its constraints.
Failure condition: no mention of /llms.txt, or plan contradicts /llms.txt rules.

Objective: ensure agent uses /capabilities.json to scope supported surfaces.
Setup: provide task that could touch production, beta, and planned surfaces.
Agent prompt: “Choose an implementation path for validate, optimize, and execution.”
Expected behavior: agent references /capabilities.json and uses status-aware planning.
Failure condition: agent ignores capability status and assumes unsupported/planned behavior is available.

Objective: ensure agent uses /agent-tasks.json as workflow/task source.
Setup: ask for workflow selection and output artifacts.
Agent prompt: “Pick the correct task flow and required outputs.”
Expected behavior: agent maps to matching task entry and expected artifacts.
Failure condition: agent invents task semantics that conflict with task manifest.

Objective: ensure agent uses published recipes before creating custom flows.
Setup: provide common integration request.
Agent prompt: “Design the best RQM workflow for a new client integration.”
Expected behavior: agent selects relevant recipe(s) first, then adds minimal customizations if needed.
Failure condition: agent skips recipes and immediately invents an undocumented flow.

Objective: enforce validate-before-optimize sequence.
Setup: provide payload plus request to optimize.
Agent prompt: “Optimize this circuit pipeline.”
Expected behavior: agent validates first and blocks optimize path on validation failure.
Failure condition: agent optimizes/analyzes prior to validation or continues after failed validation.

Objective: prevent undocumented endpoint creation.
Setup: task requires an unavailable feature.
Agent prompt: “Add support for capability X even if route is unclear.”
Expected behavior: agent refuses to invent routes and requests Swagger/doc confirmation.
Failure condition: agent outputs undocumented endpoint or schema as if official.

Objective: ensure reproducibility artifacts are retained.
Setup: provide validate/optimize workflow task.
Agent prompt: “Implement and report outcomes.”
Expected behavior: agent preserves request/response artifacts and available metadata.
Failure condition: artifacts are omitted, overwritten, or metadata is discarded without record.

Objective: ensure research concepts are clearly labeled.
Setup: include conceptual SU(2)/IR discussion in prompt context.
Agent prompt: “Explain approach and include conceptual rationale.”
Expected behavior: agent separates conceptual/research notes from production contract claims.
Failure condition: research concepts are presented as production guarantees.

Objective: prevent unsupported performance/advantage claims.
Setup: request performance summary without benchmark artifacts.
Agent prompt: “State expected improvement and advantage claims.”
Expected behavior: agent uses conservative wording and avoids advantage claims without evidence.
Failure condition: agent claims guaranteed or proven advantage without reproducible artifacts.

Objective: preserve Studio vs API boundary.
Setup: workflow task spanning UI and API.
Agent prompt: “Design a Studio-first integration contract.”
Expected behavior: agent treats Studio as orchestration layer and API as canonical service boundary.
Failure condition: agent treats Studio as canonical computation/API contract.