AI / Workflow / Engineering / Productivity
AI Coding CLI Workflow: From Prompt Chaos to Engineering Rigor
AI CLIs like Claude Code, Gemini CLI, and Cursor accelerate delivery, but they default to happy-path code. Here's the workflow I use to wrap them in production-grade engineering discipline.
|9 min read
Introduction
Terminal-based AI coding agents like Claude Code, Gemini CLI, Qwen CLI, and Cursor accelerate development by translating natural language into working code. But they share a predictable flaw: they optimize for speed and completeness over safety. Without guardrails, they generate happy-path implementations that skip error handling, omit test coverage, ignore observability, and bypass rollback strategies.
The solution is not to restrict the AI, but to structure the interaction. This workflow enforces engineering discipline at every stage, ensuring AI output meets production standards before it reaches version control.
Initial Setup: Anchor AI Behavior & Tool Syntax
Most CLI agents support auto-loading project context files at the start of each session (verify your tool's version for exact behavior). To anchor the AI, place one of the following in your repository root:
CLAUDE.md(Claude Code).cursorrules(Cursor)AGENTS.mdor.gemini/config(Gemini CLI, Qwen CLI, custom agents)
Include your tech stack, coding standards, and architectural preferences in the file. This prevents repetitive prompting and ensures consistent behavior across fresh sessions.
💡 Pro tip: Paste the baseline directive below directly into your context file and omit it from individual phase prompts. This saves ~150 tokens/turn and prevents duplication drift.
Baseline Directive for Context File:
1
You are a senior production engineer. Always prioritize security, observability, and rollback safety over speed. Never skip error handling, tests, or structured logging. Output diffs, not raw files. Ask clarifying questions before assuming edge cases.
Tool-Specific Execution Notes:
- Claude Code: Use
/planto lock the agent into design mode. Reference files with@pathto inject exact context. - Cursor: Use
@workspaceor@fileto scope the agent. EnableAgent Modefor iterative file generation. - Gemini CLI / Qwen CLI: Pass
--context-file PLAN.mdor use inline@references. Some versions support--strictor sandbox flags to reduce speculative code generation—check your CLI docs for availability.
Required Upstream Artifacts:
The workflow assumes pre-existing or stakeholder-provided reference material — user stories, acceptance criteria, business rules, regulatory constraints, and any existing schema or API contracts. Place these in a dedicated directory (e.g., docs/references/) and pass them by @path when prompting Phase 1.
The AI synthesizes PLAN.md from architectural intent plus these upstream references. It does not invent requirements. If a reference is missing, the agent will guess — and those guesses surface as gaps during Phase 2 audit. Cheaper to provide the inputs than to backfill them later.
Typical reference set:
- Product requirements document (PRD) or feature brief
- User stories with acceptance criteria
- Business rules and compliance constraints (PCI, HIPAA, GDPR, UU PDP, etc.)
- Existing schema, API contracts, or integration boundaries (when extending a system)
- Style guide or design tokens (when UI is in scope)
Tiered Workflow Paths
Not every system requires the same rigor. Select a path based on impact and risk:
| Path | Scope | Phases Required | Approval Gate |
|---|---|---|---|
| Full | Customer-facing, financial, or data-critical systems | 1 → 2 → 3 → 4 → 5 → 6 | Security scans, ≥80% coverage, peer review, rollback dry-run |
| Lite | Internal tools, admin dashboards, low-risk MVPs | 1 → 3 → 4 → 5 → 6 | Lint pass, basic unit tests, peer review |
| Emergency | Hotfixes, incident mitigation | 3 → 5 → 6 (compressed) | Targeted test, security scan, post-incident review |
Phase 1: Architecture & Planning
Define the system before generating code. AI performs best when given explicit boundaries, compliance requirements, and a clear separation between design and implementation.
1
We are building a production-ready application.
2
Stack: [specify]. Constraints: [specify]. Compliance/Security: [if applicable].
3
4
Generate a PLAN.md covering:
5
1. Architecture and core components
6
2. Database schema and migration strategy
7
3. Authentication and authorization pattern
8
4. Error handling and structured logging
9
5. Testing pyramid (unit, integration, e2e)
10
6. CI/CD and deployment strategy
11
7. Observability (metrics, tracing, alerting)
12
8. Known risks and rollback plan
13
14
Do not write code. Focus on technical design that can be audited. Output in Markdown format.
Phase 2: Critical Audit
Open a fresh session to eliminate confirmation bias. Treat the new context as an independent reliability and security auditor. The agent must only produce an audit report without modifying the original plan.
1
Read PLAN.md in the root. Audit it from a production-readiness perspective.
2
Use these benchmarks: 12-Factor App, OWASP Top 10, SRE fundamentals.
3
4
Identify:
5
- Security and data privacy gaps
6
- Missing test coverage strategy
7
- Single points of failure
8
- Deployment and rollback risks
9
10
Output as AUDIT.md. Do not modify PLAN.md. Provide concrete recommendations, not theoretical advice.
Phase 3: Task Breakdown
Consolidate PLAN.md and AUDIT.md back into your primary session. Convert the validated architecture into an executable checklist. Each item must contain acceptance criteria and rollback instructions.
1
Based on PLAN.md and AUDIT.md, create TASKS.md.
2
Format per task:
3
- Task ID: T-001
4
Scope: ...
5
Acceptance Criteria: ...
6
Test Strategy: ...
7
Dependencies: ...
8
Rollback Step: ...
9
Estimated Complexity: Low, Med, or High
10
11
Prioritize by dependency and risk. Maximum 15 initial tasks. Ready for incremental execution.
Phase 4: Documentation Consolidation
Once TASKS.md locks scope, generate durable handoff documentation. The AI consumes the validated plan, audit, and task list, then produces two artifacts split by audience.
Why after task breakdown, not before:
PLAN.mdalready covers architecture, schema, and auth — Phase 2 validates them.- Pre-breakdown docs drift every time tasks pivot. Post-breakdown docs reflect locked scope.
- Functional and Technical docs are deliverables, not planning inputs. They serve Product, QA, Support, and future engineers — not the audit gate.
1
Read PLAN.md, AUDIT.md, and TASKS.md.
2
3
Generate FUNCTIONAL.md:
4
- User stories grouped by feature
5
- Acceptance criteria per story
6
- Business rules and constraints
7
- Edge cases and error states
8
- Audience: Product, QA, Support
9
10
Generate TECHNICAL.md:
11
- System architecture diagram (mermaid)
12
- Database schema and relationships
13
- API contracts (endpoints, payloads, error codes)
14
- Authentication and authorization flow
15
- Observability surface (logs, metrics, traces)
16
- Rollback and recovery procedures
17
- Audience: Engineering, SRE
18
19
Flag any inconsistency between PLAN.md and TASKS.md before generating.
20
Output as two separate Markdown files. Do not modify upstream artifacts.
💡 Tip: Treat these as living ebooks. Re-run this prompt after any scope change in
TASKS.mdso the docs match the shipped state. Store alongsidePLAN.mdandAUDIT.mdin your documentation directory.
Phase 5: Atomic Execution & Pass/Fail Gates
AI CLI agents fragment focus when handling large features. Break implementation into atomic commits. Each prompt should reference a single task ID, enforce diff-only output, and require validation before moving to the next item.
Context Budget & Resume Pattern:
-
Cap sessions at ~80% of the model's context window.
-
Before rotating, generate
CONTEXT_SUMMARY.md. Use this skeleton to anchor the next session:1 ## Current Progress: [Task/Phase completed] 2 ## Open Decisions: [Unresolved architectural or logic choices] 3 ## Next Steps: [Exact next task ID and file targets for new session] -
Resume new sessions by passing
CONTEXT_SUMMARY.mdas the initial context anchor.
💡 Tip: Most CLIs display token usage in the session header. If yours doesn't, cap planning prompts to ~3k tokens and execution prompts to ~2k to stay safely under truncation thresholds. For a deeper dive on cutting token burn across CLI agents, see Cutting AI Coding Agent Token Burn 75%+.
1
Execute TASKS.md item [Task ID].
2
Follow these rules:
3
- Implement only the requested scope
4
- Include unit and integration tests matching the test strategy
5
- Apply structured logging and proper error handling
6
- Output changes as unified diffs only
7
- Provide verification steps before marking the task complete
8
9
Do not modify unrelated files. Wait for human review before proceeding.
💡 CLI Tip: If your agent prints diffs to stdout, redirect them:
ai-cli "Execute T-001" > t001.patchbefore runninggit apply --check.
Measurable Pass/Fail Gates:
- Test Coverage: ≥80% for new/modified files (Full), ≥60% (Lite)
- Security Scans: 0 critical/high vulnerabilities
- Lint & Type Check: 0 blocking errors
- Manual Approval: Required PR review from 1 senior engineer
- Rollback Script: Present and tested in staging
Phase 6: Human Code Review & Merge
AI output is draft code until validated by human judgment. After passing all gates:
- Run
git diffor use your IDE's PR view to inspect every change line-by-line. - Verify that tests cover the new logic and that no regressions were introduced.
- Check for hardcoded values, missing error paths, or over-engineered abstractions.
- Squash or merge via protected branch rules. Never skip peer review for production branches.
Safe Review & Enforceable Git Rules
Verbal guardrails fail under pressure. Enforce AI behavior through deterministic commands and repository policies.
Review AI Changes Safely: Modern AI CLIs edit files directly in your working tree and can read git context natively. Instead of manual patch application, use:
# 1. Review all AI-modified files before staging
git diff
# 2. Stage only verified changes
git add -p # Interactive staging to review hunks individually
# 3. Commit with clear, scoped messages
git commit -m "feat: implement T-001 with tests and error handling"
Enforceable Git Hooks & Policies:
- Local Pre-commit Guard: Run this one-liner before committing to block untested AI diffs:
npm run lint && npm test -- --findRelatedTests $(git diff --name-only || echo .)(Replace with your stack's equivalent:./mvnw verifyorgradle checkfor Spring Boot,ruff check . && pytestfor Python, orgo vet ./... && go test ./...for Golang.) This validates changes locally without complex setup. For mature teams, migrate to.pre-commit-config.yamlwith automated linting, secret scanning, and coverage gates. - Branch Protection Rules: Require status checks (CI, coverage, security scans) and pull request approvals before merging. Disable force-push and direct commits to
main/releasebranches. - AI Execution Constraint: Configure CLI wrappers to disable native
git commit/git pushcommands. Route all state changes through human-reviewed patch application.
Provenance & Audit Logging
AI-generated code must be traceable for compliance and incident response.
- Log every CLI interaction using
teeor structured logging:your-ai-cli-command | tee logs/ai-session-$(date +%s).log - Include prompt version, model name, context files loaded, and output diffs in each log entry.
- Store
PLAN.md,AUDIT.md,TASKS.md,FUNCTIONAL.md,TECHNICAL.md, andCONTEXT_SUMMARY.mdin a dedicateddocs/ai-audit/directory. Commit alongside code changes for full traceability.
Quick-Reference: Production Hardening Rules
| Area | Standard | AI Behavior | Human Responsibility |
|---|---|---|---|
| Security | OWASP Top 10, least-privilege access, secret rotation | Rejects hardcoded credentials, enforces input validation | Reviews auth flows, validates access policies, scans dependencies |
| Observability | Structured JSON logs, distributed tracing, health endpoints | Injects correlation IDs, adds span propagation, formats logs | Defines SLOs, configures alerting thresholds, reviews dashboards |
| Testing | Unit, integration, and e2e coverage; failure-path validation | Generates mocks, asserts edge cases, verifies test execution | Reviews test coverage, validates flaky tests, approves suites |
| Deployment | Incremental releases, immutable build artifacts, rollback scripts | Outputs infrastructure diffs, flags configuration drift | Approves deployments, runs dry-runs, monitors rollout metrics |
| Execution | Atomic commits, feature flags, single-scope changes | Outputs unified diffs, isolates changes, blocks multi-step merges | Applies patches, runs CI, reviews PRs, triggers merge |
Reality Check: What AI Cannot Replace
AI amplifies engineering effort; it does not replace engineering judgment. Always perform manual review before deploying:
- Auth and payment flows frequently miss race conditions, idempotency guarantees, and token rotation logic.
- Data migrations are often irreversible; schema changes can cause downtime or silent corruption.
- Infrastructure and secrets require environment-specific tuning. AI-generated configurations expose systems if deployed without validation.
- Edge cases like memory leaks, graceful shutdowns, rate limiting, and exponential backoff demand human stress testing.
- Compliance frameworks (like data privacy or financial regulations) require architectural and legal verification that AI cannot guarantee.
Never deploy without automated security scans, load testing, secret rotation validation, and rollback dry-runs. Treat AI CLI agents as disciplined implementors guided by senior engineering oversight. You remain the final authority on technical trade-offs and production safety.
Conclusion
Production readiness does not emerge from prompt sophistication. It emerges from explicit context, audited architecture, security guardrails injected from day one, version-controlled planning, and incremental execution with measurable acceptance criteria. By treating AI CLI agents as structured engineering collaborators rather than autonomous code generators, you reclaim predictability, reduce technical debt, and ship systems that survive real-world traffic.
If you're applying this workflow to a real system and want a second pair of eyes on your architecture plan, reach out to compare notes—I'm always happy to share stack-specific adjustments that keep AI output reliably shippable.
Written by Erik Yuntantyo·Software Engineer·About me