Kubernetes vs Docker Swarm vs Nomad: Container Orchestration Showdown 2026

My team’s internal tooling cluster hit a wall in late January. We had about 20 services running across two Docker hosts — managed manually with docker-compose and a lot of wishful thinking. One host was at 80% memory constantly. In December alone we had two separate incidents where someone (okay, me) accidentally stopped a critical service while trying to restart another. Time to actually fix this.

The usual debate followed: Kubernetes, Docker Swarm, or Nomad? I’d used K8s at a previous job — a fintech with 500+ pods in Production Workloads” rel=”nofollow sponsored” target=”_blank”>Production Workloads” rel=”nofollow sponsored” target=”_blank”>production — so I knew what it could do. I also knew what it cost to operate. My current team is four engineers. We don’t have a dedicated platform person. So the question wasn’t just “what’s most capable” — it was “what can we actually run without it becoming a second job?”

I spent two weeks — end of January through mid-February 2026 — deploying the same reference workload to all three. Same apps, same node count (3 VMs on Hetzner, 4 vCPUs / 8GB RAM each), same basic requirements: rolling deployments, service discovery, persistent storage for one stateful service, basic health checks. Here’s what I found.

My Test Setup, So You Can Calibrate

The workload was: four stateless Go APIs, two Python/Celery background workers with moderate CPU, one Redis instance needing persistence, one PostgreSQL that needed careful scheduling, and Caddy as a reverse proxy. Nothing exotic.

I used Terraform to provision the VMs and Ansible for initial configuration. All orchestrator-specific config I wrote by hand — no Helm charts pulled from the internet, no pre-built Nomad job packs. I wanted to understand the actual primitives, not a curated abstraction on top of them.

For each tool I measured: time to initial working cluster, time to deploy a config change, behavior during a simulated node failure (kill the Docker daemon on one worker), and roughly how long it took me to debug the first non-obvious problem.

Kubernetes: You Get a Lot, and You Pay for It Every Day

K8s is still the default answer when someone asks me what to use — and I have complicated feelings about that.

Getting a three-node cluster up with kubeadm took about 90 minutes, including the usual kubeadm init ceremony, joining workers, and deploying a CNI plugin (Flannel — it just works, I wasn’t here to optimize the network layer). The kubectl experience is familiar at this point. The declarative YAML model makes sense once you’ve internalized it.

Here’s a stripped-down deployment for one of my Go services:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-gateway
  namespace: internal-tools
spec:
  replicas: 2
  selector:
    matchLabels:
      app: api-gateway
  template:
    metadata:
      labels:
        app: api-gateway
    spec:
      containers:
        - name: api-gateway
          image: registry.internal/api-gateway:v0.14.2
          ports:
            - containerPort: 8080
          # Without resource limits, one bad deploy will starve your neighbors.
          # Learn this before prod teaches it to you.
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
            limits:
              cpu: "500m"
              memory: "256Mi"
          readinessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10

Clean. Readable. Works exactly as expected.

The problem is everything around this. Secret management: Kubernetes Secrets are base64-encoded, not encrypted at rest by default, which means you’re either configuring etcd encryption, reaching for Vault, or using the External Secrets Operator. Fine — but that’s another system to learn, deploy, and operate. Persistent volumes: I used local-path-provisioner for the test, but in production you’d want a CSI driver wired to your cloud provider. Ingress: pick a controller, configure it, then debug annotations for an hour. Monitoring: Prometheus and Grafana, or you’re paying for managed observability.

None of this is hard individually. But it compounds. Four-person team, no dedicated infra engineer — you feel that weight in your maintenance load every week.

One thing I noticed: K8s documentation has genuinely improved. The structured tutorials are solid. But the ecosystem is so large that you’ll still spend hours reading GitHub issues for edge cases. I hit a specific problem where my StatefulSet for PostgreSQL was interacting weirdly with node taints — I’d tainted one node to prefer stateful workloads, and the scheduler was doing something unexpected with the tolerations. Took me about two and a half hours to track down. Not K8s’s fault exactly, but it’s the kind of hole you fall into, and it happens more than you’d like.

Rolling deployments worked flawlessly. Node failure handling was exactly right — pods rescheduled within roughly 30 seconds. The platform itself is solid. I’m not disputing that.

If your team has more than 10 engineers, uses a major cloud provider with managed K8s (EKS, GKE Autopilot specifically), and has at least one person who owns infrastructure — use K8s. The managed versions eliminate the control plane headaches that make self-hosted K8s expensive for small teams. Self-hosted K8s for a four-person team is a different calculation entirely.

Docker Swarm: The Reliable Friend Who’s Stuck in 2019

I went in with genuine optimism. Swarm has a reputation for simplicity, and after K8s’s setup surface area, that sounded good.

The initial setup really is simple:

# On manager node
docker swarm init --advertise-addr <MANAGER_IP>

# On each worker (token comes from the above output)
docker swarm join --token <SWARM_TOKEN> <MANAGER_IP>:2377

# Deploy from a Compose file you probably already have
docker stack deploy -c docker-compose.yml internal-tools

That’s it. You’re running a cluster. If your team already lives in docker-compose, this is a low-friction upgrade. The mental model transfers almost completely.

And then you start hitting the edges.

Look, the feature gap with K8s is documented everywhere, but here’s what actually bit me: rolling update controls are limited (you get update_parallelism and update_delay, not K8s’s maxUnavailable/maxSurge granularity). Secret rotation has no built-in story beyond docker secret create, which is manual. Health check integration works but isn’t as flexible as readiness/liveness probe logic in K8s.

The bigger issue — the one I couldn’t get past — is that Swarm’s development has effectively stalled. Docker’s ownership history is complicated (Mirantis now owns the engine), and the GitHub issues tell the story if you look at response times and the absence of a meaningful roadmap. That’s not a knock on the maintainers — the ecosystem reality is what it is — but betting infrastructure on a platform with no visible roadmap is a genuine operational risk.

I pushed a config change on a Friday afternoon — I know, I know — and hit something where a service failed to update cleanly and dropped to zero replicas for about four minutes before I caught it. I’ve tried to reproduce it and can’t reliably, so it might have been a transient daemon issue. But it shook my confidence in a way that’s hard to reason past.

For simple stateless services on a small team? Swarm is probably fine today. But I found myself mentally listing all the workarounds I’d need as requirements grew, and the list got long fast.

Nomad: I Didn’t Expect to Like This as Much as I Did

Honest confession: I almost didn’t include Nomad in this test. My mental model was “the thing HashiCorp makes that isn’t Terraform or Vault.” I included it reluctantly, mostly for completeness.

Two weeks later, it’s the option I keep coming back to.

Nomad is a general-purpose workload scheduler — not a container orchestrator that was bolted on top of something else. It can schedule Docker containers, raw executables, Java JARs, system-level jobs. For my use case that breadth is mostly irrelevant, but it tells you something about the design: they thought carefully about the scheduling problem before adding container support, rather than the other way around.

The HCL job spec is the most readable orchestration config I’ve written in years:

job "api-gateway" {
  datacenters = ["dc1"]
  type        = "service"

  group "api" {
    count = 2

    # Nomad spreads replicas across nodes by default — no affinity
    # rules required to get basic availability.
    network {
      port "http" { to = 8080 }
    }

    task "gateway" {
      driver = "docker"

      config {
        image = "registry.internal/api-gateway:v0.14.2"
        ports = ["http"]
      }

      resources {
        cpu    = 500  # MHz
        memory = 256  # MB
      }

      # Consul integration is native. Service registration
      # lives right here in the job spec, not in a separate manifest.
      service {
        name = "api-gateway"
        port = "http"

        check {
          type     = "http"
          path     = "/healthz"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}

Because Nomad integrates natively with Consul for service discovery and Vault for secrets, you get a coherent operational story across all three — without assembling K8s’s ecosystem from pieces. I expected this to feel like HashiCorp lock-in. Instead it felt like someone had actually designed these tools to work together, which turns out to matter a lot when you’re debugging at 11pm.

The UI is genuinely good. Better than the default K8s dashboard, more informative than anything Swarm ships with. Rolling deployments with canary support worked cleanly. When I killed the Docker daemon on a worker node, Nomad rescheduled the affected tasks in about 20 seconds. Which brings me to the thing that actually surprised me — the day-two operational experience is noticeably lower-friction than K8s. I thought about what I was doing less. I debugged fewer abstractions. I spent more time on the actual applications.

What Nomad doesn’t have: K8s’s ecosystem depth. If you need a specific operator pattern — say, for Kafka, Elasticsearch, or a cloud provider’s custom resource — you’re probably building it yourself or going without. The community is smaller. When you hit a strange edge case, you may end up reading the source code on GitHub before finding an answer. I’m not 100% sure how it scales beyond 50 nodes with complex mixed workloads; I’ve seen reports of teams running it at that scale, but I haven’t done it myself and I won’t pretend otherwise.

What I Would Actually Deploy

Enough hedging. Here is the real answer.

Small team — under 10 engineers, no dedicated platform function, mix of stateful and stateless workloads: Nomad plus Consul. The operational simplicity is genuine, not just marketing. The config is readable by people who didn’t write it. The Vault integration means you’re not stitching together four separate tools for secrets. The smaller ecosystem is a real constraint, but small teams shouldn’t be running 40 different operators anyway. Keep it simple, know what you’re running.

Team on a managed cloud with more than 10 engineers: EKS, GKE, or AKS — not self-hosted K8s. Managed control planes eliminate the bulk of what makes K8s expensive to operate. At that point the ecosystem depth becomes genuinely valuable, the tooling around compliance and observability is better than anything else available, and you’re paying for capability you’ll actually use.

Docker Swarm: I wouldn’t start a new project on it today. If you’re already running it and it’s stable, you don’t have to migrate tomorrow — but I’d be planning the migration.

Anyway, the orchestration decision matters less than people make it sound, right up until it really matters. The real variable is whether your team actually understands what they’re running and can debug it at 2am with confidence. Start with the simplest thing that genuinely meets your requirements. For most small teams, that’s Nomad. Most mid-to-large teams on cloud infrastructure should be on managed K8s, full stop. And if someone tells you Docker Swarm is the right choice for a greenfield project in 2026, ask them when they last looked at the commit history.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top