Hello and welcome to this Deep Dive Thursday! I’m Lauro Müller, and I’m super happy to have you here with me 😄 Let’s spend a few minutes to sharpen our skills and learn something new and relevant for running production systems, shall we? Today we will explore in details why Deployments fall short for production releases, and how Argo Rollouts provide a considerably more robust setup for safe releases. Ready to get started? Let’s go!
Kubernetes Deployments are good at one thing: replacing Pods without taking the application down all at once.
That is a big win when compared to creating Pods or ReplicaSets directly, yes, but when it comes to production releases, it has many shortcomings.
A Pod can become Ready, join the Service, and still break things the moment real traffic hits it.
And that’s a big problem. Although Kubernetes sees a healthy Pod, your actual users may see failed requests, weird latency spikes, or a feature that looked fine in testing but falls apart in production. Not the best UX, right?
Why Deployments Are Not Enough
This begs the question: why are Deployments not enough?
As it turns out, a plain Deployment is more geared towards availability during Pod replacement.

Its controls, such as maxSurge and maxUnavailable, help Kubernetes move from one ReplicaSet to the next without dropping too much capacity. That is already an improvement, but it tackles only how Pods should be replaced, not whether they should be replaced altogether.
Production releases need a second question: should this new version keep receiving more traffic?
That is where Deployments start to fall short.
Readiness is not release success. Readiness probes are narrow by design. They tell Kubernetes whether a Pod is ready to receive traffic, not whether the new version is actually behaving well for users.
A health endpoint can return 200 while database credentials are broken, downstream calls are failing, or latency has doubled under load.
"Hey Lauro, but can't we create a setup where the readiness probe hits an endpoint that gives more confidence about whether the new Pods are bug-free?" We surely can create an endpoint that is better than just sending up or 200 back, but aiming to replicate more complex scenarios would quickly become a coding and maintenance nightmare. How would we encode multiple business rules in a single endpoint? How would we try to capture all the operations our users perform against our real system? Tough to work with, right?
Not only that, but Deployments also do not give you a first-class progressive delivery workflow.
What do I mean by that? Once a rolling update starts, Kubernetes keeps moving as long as readiness allows it. If something subtle breaks, the usual response is manual rollback, which is just another rolling update in reverse. There is no pause, there is no possibility of human approval, and there surely isn’t any automated analysis running.
Argo Rollouts addresses these gaps by replacing the Deployment with a Rollout resource.
Instead of one continuous update, you define explicit release steps. You can send only part of the traffic to the new version, pause, inspect what is happening, and then either continue or stop. You can even run automated analyses to double-check that your metrics are all within acceptable boundaries before promoting the release further!
Just as a side note: If you’re already excited to learn more about Argo Rollouts, fear not! I do have a complete Argo CD and Argo Rollouts course, and we cover many important aspects of working with both tools in production environments. Make sure to check that out! It’s a great way to get some value back and support the platform so that I can produce more awesome content 😄
What Rollouts Change
The core idea of a canary deployment is simple: do not expose all users to a new version at once.
Start with a small slice of traffic. Observe the result. Then decide whether the release deserves more exposure. This is where Argo Rollouts delivers the most value.
Sidenote here: In this Deep Dive, we’re talking about Canary releases, but Argo Rollouts also supports Blue Green Deployments, in case you want to explore that!

A release becomes a sequence of deliberate checkpoints instead of a controller racing from old Pods to new Pods. You can define a rollout that sends 20% of traffic to the canary, pauses for five minutes, then moves to 50% only if the signals still look good.
If error rate or latency degrades, you abort before the failure reaches everyone.
This also changes the definition of rollbacks (at least when compared to what a Deployment considers rollbacks to be). With a plain Deployment, rollback is a recovery action after the update has already spread. With Rollouts, limited exposure is built into the release process from the start.
Another important detail: Argo Rollouts can use automated analysis, such as Prometheus-based checks, to decide whether promotion should continue.
That gives you a much better definition of release success than Pod readiness alone. The version is not considered safe because the container started. It is considered safe because the service still behaves within acceptable thresholds.
Why 20% Is Not Always 20%
There is one caveat that is easy to miss: a canary percentage is not always an exact traffic percentage.
If you use Argo Rollouts without traffic management (Gateway API, Traefik, Istio, and the likes), the controller makes a best-effort approximation based on replica counts.
That works reasonably well at higher replica numbers, but it becomes coarse very quickly. With ten replicas, 20% is easy to approximate. With two replicas, it is not even possible.
In that case, teams often think they are testing 20% of traffic when they are really testing one out of two Pods.
That is why advanced Rollouts setups use traffic management, for example Gateway API integration, to enforce precise routing. At that point, canary steps stop being approximate and start reflecting the traffic split you actually intended.
A Small Upgrade You Can Make Today
Now that we know the benefits of Rollouts over Deployments and we really want to use Rollouts, the great news is that migrating between them is super easy. After installing Argo Rollouts in your cluster (you can do that with their lovely Helm Chart), all you have to do is:
Change the
Deploymentmanifest to aRolloutmanifest;Add an explicit traffic shift
strategyto the Rollout. Here is one example of a straightforward one (quick challenge: can you guess how a Rollout would progress with such configuration?)
strategy:
canary:
steps:
- setWeight: 20
- pause: {}
- setWeight: 50
- pause: { duration: 5m }Wait… What did we just code here?
When the rollout starts, it will direct 20% of the traffic to the new version.
It will then pause indefinitely: a human will need to open the Dashboard or send a CLI command (or a more generic API command, for that matter) to trigger the continuation of the Rollout.
The Rollout will then direct 50% of the traffic to the new version.
It will once again pause, but now not indefinitely: it will wait 10 minutes, and then proceed to sending 100% of the traffic to the new version.
To be clear, this is not a full production policy. Nonetheless, it is still a meaningful improvement. Instead of replacing everything as fast as Pods become Ready, you create two checkpoints where the release can be observed before promotion continues. Isn’t that cool?
Thanks for reading this issue through! If you want to build this end to end, I'd recommend you take a look at my Argo CD and Argo Rollouts course. The Argo Rollouts section will take you through the progression from Deployment limits to Rollout strategies, traffic management, and automated promotion decisions.
I hope you enjoyed this issue, and let me know if you have any ideas or would like to see a specific topic covered here!
