Home Blog GenAI Spec-First Engineering: for Mission
-Critical Systems with Claude Code insights from Jakub Walczak

Spec-First Engineering: for Mission
-Critical Systems with Claude Code insights from Jakub Walczak

AI can generate code fast. That’s impressive. What’s actually impressive is when that code works in production, under load, at scale, without breaking three sprints later.

Here’s the uncomfortable truth: most AI-generated code fails – not because of the AI itself, but because of vague requirements, missing edge cases, and no shared understanding of what “done” actually means. The real question isn’t how to get AI to write better code. It’s: what do we need to define before AI writes anything at all?

To answer that, Piotr sat down with Jakub Walczak, a senior software engineer at Boldare who has been deep in the trenches of building systems where failure is simply not an option – and who has been applying spec-driven development with Claude Code as a core part of his daily workflow.

Table of contents

Piotr: Jakub, we keep seeing the same problem across almost every engineering team we work with. AI generates code fast, but a lot of it doesn’t hold up in production. You’ve been experimenting with something called spec-driven development. What is it exactly?

Jakub: Right, so like all trending terms, especially in the AI space — it doesn’t have one established definition yet. But the way I think about it: before writing any single line of code, you have to prepare specifications. Instead of throwing vague prompts at the AI, you need a structured document covering all the edge cases, all the business requirements, all the expected behaviors. That’s what gives you more deterministic, reliable results on the other end.

Piotr: So it’s basically an expansion of the “spec first” idea, the upfront work needs to be really well defined before anything gets built.

Jakub: Exactly. The entire shift in spec-driven development is moving the focus away from source code – which has become the final product of the workflow for most developers – and putting it on the specification itself. As engineers, our job becomes orchestrating that definition process in order to get satisfying results from the AI.

Piotr: What actually goes into a spec? You mentioned edge cases and business requirements, what else?

Jakub: Honestly, for me a spec is everything that could matter while working on a particular feature. So definitely the expected behavior at the business level. Edge cases and known limitations. Diagrams of the business flow. Mermaid format works really well because AI consumes it nicely. You could even add links to existing source code, though I’m not sure that’s always a good idea.

The key principle is that a well-written spec should give you a deterministic output. Of course results will vary slightly from one run to another, but if your spec is solid, you should consistently reach a satisfying result. And at that level, the technical details. Java, Python, whatever — become secondary. The spec is focused on business value and behavior, not on implementation.

Piotr: So where does Claude Code fit into this workflow?

Jakub: You can actually use AI to help you write the spec in the first place. The first step of the workflow is combining all your sources, things written down, things you remember from months ago, tribal knowledge that lives nowhere. You feed all of that to an agent and it helps you craft a more detailed, more complete specification. That’s what drives better results downstream.

A few weeks ago I had a good example of this. I had to implement a feature in our invoicing domain – an area I hadn’t touched in months. The ticket itself was relatively simple, maybe 30 minutes of implementation, another hour of thorough testing. But instead of jumping straight in, I decided to build what I called an “Invoicing Expert” Claude skill – essentially a deep knowledge artifact about our invoicing domain. That took me two, three, maybe four hours to build properly.

Piotr: This isn’t a simple invoice with one line item, right? Not “give me an iPhone invoice.”

Jakub: Not at all. Our invoicing has a lot of moving parts, different margin sources, different document types, a lot of business rules. So I invested those hours upfront building the skill, used it to implement the feature, and everything went smoothly.

Then, two days later, we got a complaint from a customer. Not exactly a bug report – more like “this isn’t working the way we expected, there must be some discrepancy between the intended behavior and what’s actually in the code.” Because I had built that skill two days earlier, I was able to use it immediately to find the issue, fix it, and test the fix in about 10 minutes. The hours I spent upfront paid back almost instantly.

Piotr: So the skill itself essentially became the spec for that feature?

Jakub: That’s how I think about it, yes. The skill captured the business requirements and the domain knowledge in a structured way. At minimum it’s part of the process of creating a specification. Either way, it gave the AI and me, a shared, reliable understanding of what we were working with.

Piotr: Let’s talk about the tension a lot of teams feel here: spending more time on specs versus spending more time on execution. Is spec-driven development a form of over-engineering?

Jakub: That’s a fair question. But using AI agents to help with source code doesn’t automatically mean you’re going twice as fast – it doesn’t work like that. The delivery time on individual features is a bit shorter, yes. But we’re now spending more time crafting specifications than we used to spend writing loops in the source code.

What that time buys you is real thinking space. You can review all the requirements, all the edge cases, all the business value of a feature before you’ve written a single line. And in that process you often catch things, unclear requirements, missing pieces, outright contradictions – before they become expensive problems in production. Developers love jumping straight into implementation. But in the AI era, that instinct needs some recalibration.

Piotr: Especially in mission-critical systems where you simply can’t afford mistakes. And there’s another benefit you touched on – the spec makes the work transferable. If you weren’t available, a colleague who hadn’t worked on that invoicing feature could pick up your spec and still be able to fix a bug.

Jakub: Exactly. That knowledge is no longer locked in one person’s head. It’s documented, structured, and reusable.

Piotr: Okay, but nothing is all upside. What are the pitfalls?

Jakub: I can think of two or three. The first one: to get deterministic results over time, you need to maintain your artifacts. A skill or spec that’s heavily tied to specific lines of code will go stale quickly as the codebase evolves. Business rules don’t change that often, code does. So specs should be written at a general, behavior-focused level, not tied to implementation details. That way you can use the same spec today or two months from now.

The second thing – and this is a real example from our team. We inherited a project from another team and had to start delivering features immediately. I used spec-driven development, worked with AI to generate the code, reviewed it carefully, and thought it looked solid. It fulfilled all the requirements I was aware of.

Then I got the code review back from a colleague who had actually been in workshops with the client – someone with deep contextual knowledge that wasn’t written down anywhere. Out of around 40 modified files, I got 25 review comments. Some were minor – rename this variable – but several were genuinely serious. With the assumptions I had made, the code could have caused real harm in a critical part of our system.

Piotr: So spec-driven development is only as good as the spec – and the spec is only as good as the knowledge that goes into it. People are still essential.

Jakub: Absolutely. Programming is a team sport. You still need people covering each other’s backs. The AI doesn’t know what it doesn’t know – and neither do you, if you’re missing the right context upfront.

Piotr: There are also some emerging frameworks in this space, right?

Jakub: Yes – SpecKit is one I’ve seen mentioned recently. It’s still early but it’s trying to standardize spec-driven development as a methodology. I haven’t had a chance to dig into it yet, but it’s on my list.

There’s also an interesting framework I read about recently that distinguishes three levels of spec-driven development. The first is “spec first” – you write the spec, hand it to an AI agent, it generates code, you review and modify by hand. The second is “spec in sync” – you try to keep the spec and the source code aligned over time. I’m currently somewhere between those two. The third level is “spec as source” – you never touch the code directly at all. You only modify the spec and ask the agent to regenerate or fix the code accordingly. Because right now, code is cheap.

Piotr: Let’s close with something actionable. If someone watching wanted to start applying spec-first engineering in their team tomorrow – what’s the single first step?

Jakub: I know you want just one, so here it is: instead of throwing vague prompts at your AI agent, spend 30 minutes to an hour crafting a more detailed specification first. Write down the expected behavior, find the gaps, surface the uncertainties, identify the edge cases. Then pass that to the AI. The code you get back will be noticeably better.

Piotr: Exactly. And the beauty of it is you can start experimenting right now. Sometimes you’ll be positively surprised, sometimes not – but that’s the game we’re all playing. Jakub, thank you. I learned a lot today.

Jakub: Thanks for having me. It was a pleasure.

If this got you thinking about your own system and engineering workflow, Boldare offers a free 30-minute consultation – no hype, just practical guidance grounded in real production experience. Drop us a line at business@boldare.com.