The challenge
A major Australian bank required a platform for managing Apigee shared flows across multiple environments (SIT, UAT, Production). The organisation needed visibility into deployment status, runtime health, error patterns, and change impact across 625+ proxies. No commercial solution existed, and conventional development would require weeks of planning and dedicated resources.
The approach: AI spec-driven development
The methodology followed four stages:
- Describe. Plain-English capability descriptions, without formal documentation.
- Generate. Complete implementations are produced, including Python modules, API clients, HTML templates, and XML parsers.
- Validate. Testing against production data and live APIs with 625 real proxies.
- Refine. Natural-language feedback directs further functionality within the same session.
What was built: Floey
A comprehensive platform consisting of eight core Python modules of 300 to 780 lines each:
- Deployment audit and health dashboard. Lists shared flows with revision numbers across environments; scores flows 0–100 on availability, errors, performance, and reliability.
- Error intelligence and blast-radius analysis. Detects recurring error patterns; a parallel scanner evaluates 625 proxies in two to three minutes.
- Performance profiling and cross-environment comparison. Tracks P50/P95/P99 latency across SIT, UAT, and Production with side-by-side analysis.
- Multi-format reporting. Auto-exports JSON, Markdown, HTML dashboards, and colour-coded Excel workbooks.
Several of Floey's most valuable features were never explicitly requested. The model recognised opportunities for improvement and built them as part of its responses. This is the distinguishing characteristic of AI spec-driven development.
Key takeaways
- Zero human-written code doesn't mean zero human involvement. The developer's domain expertise and quality assessment were essential to success.
- AI goes beyond the spec. Features like bundle caching, cross-org comparison, and smart terminal output were generated proactively.
- Documentation is automatic. Code and documentation stay synchronised because the model authored both.
- The methodology is repeatable. The describe-generate-validate-refine loop scales across domains where humans hold deep expertise.