{"id":408,"date":"2026-03-09T18:20:35","date_gmt":"2026-03-09T18:20:35","guid":{"rendered":"https:\/\/blog.rebalai.com\/en\/2026\/03\/09\/docker-compose-vs-kubernetes-when-to-use-which-in\/"},"modified":"2026-03-18T22:00:05","modified_gmt":"2026-03-18T22:00:05","slug":"docker-compose-vs-kubernetes-when-to-use-which-in","status":"publish","type":"post","link":"https:\/\/blog.rebalai.com\/en\/2026\/03\/09\/docker-compose-vs-kubernetes-when-to-use-which-in\/","title":{"rendered":"Docker Compose vs Kubernetes: What I Actually Learned Running Both in Production"},"content":{"rendered":"<p>Eighteen months ago I inherited a mess. A four-person team had built a reasonably capable ML inference service \u2014 three Python microservices, a Redis queue, a Postgres instance, an Nginx reverse proxy \u2014 all wired together with a <code>docker-compose.yml<\/code> that had clearly been written in a hurry and never revisited. The team lead had left a sticky note <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/08\/rag-deep-dive-chunking-strategies-vector-databases\/\" title=\"in the\">in the<\/a> README that said, verbatim: &#8220;we should probably move this to Kubernetes at some point.&#8221;<\/p>\n<p>That sticky note started a long argument with myself.<\/p>\n<p>I ended up running both. Not as an experiment \u2014 as an actual business decision I had to defend, twice, to different stakeholders. What follows is <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/08\/rag-deep-dive-chunking-strategies-vector-databases\/\" title=\"What I Learned\">what I learned<\/a>, <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/08\/rag-deep-dive-chunking-strategies-vector-databases\/\" title=\"What I\">what I<\/a> got wrong, and where I landed.<\/p>\n<hr \/>\n<h2>Docker Compose <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/09\/github-copilot-vs-cursor-vs-windsurf-which-ai-codi\/\" title=\"in 2026\">in 2026<\/a> Is Not What You Used Five Years Ago<\/h2>\n<p>The version of Compose I inherited was using some 3.x syntax with deprecated options. First thing I did was migrate to Compose v2.32 (which ships bundled with Docker Desktop and the Docker CLI now \u2014 no separate install needed). That alone fixed several subtle networking headaches.<\/p>\n<p>Thing is, Compose has gotten genuinely good at what it was always meant to do. <code>compose watch<\/code> has been stable for a while now, and it changed how I think about local development:<\/p>\n<pre><code class=\"language-yaml\"># docker-compose.yml \u2014 inference service, 2026\nservices:\n  api:\n    build: .\/api\n    ports:\n      - &quot;8000:8000&quot;\n    develop:\n      watch:\n        - action: sync\n          path: .\/api\/src\n          target: \/app\/src\n        # Rebuild only when dependencies change, not on every save\n        - action: rebuild\n          path: .\/api\/requirements.txt\n    environment:\n      - MODEL_PATH=\/models\/bert-base\n    volumes:\n      - .\/models:\/models:ro  # mount model weights read-only, not baked into image\n\n  worker:\n    build: .\/worker\n    depends_on:\n      redis:\n        condition: service_healthy\n    develop:\n      watch:\n        - action: sync+restart\n          path: .\/worker\/src\n          target: \/app\/src\n\n  redis:\n    image: redis:7.4-alpine\n    healthcheck:\n      test: [&quot;CMD&quot;, &quot;redis-cli&quot;, &quot;ping&quot;]\n      interval: 5s\n      retries: 5\n<\/code><\/pre>\n<p>That <code>sync+restart<\/code> action for the worker is something I use constantly \u2014 it syncs files then restarts the process without a full image rebuild. Saves probably 40 seconds per iteration cycle when you&#8217;re deep in debugging.<\/p>\n<p>For a team our size (four engineers, two of whom are ML researchers who don&#8217;t want to think about infrastructure), Compose has a near-zero learning curve. I can write a <code>docker-compose.yml<\/code>, push it to the repo, and anyone can <code>docker compose up<\/code> without reading a manual. That matters more than people admit.<\/p>\n<p>On a single host \u2014 even a beefy one like an EC2 <code>m7i.4xlarge<\/code> \u2014 Compose handles more than you&#8217;d think. I&#8217;ve run services doing 400 req\/s on a single host with Compose and it was fine. The constraint is the host, not Compose.<\/p>\n<p>If your service fits on one host and your team is small, defaulting to Compose isn&#8217;t laziness \u2014 it&#8217;s a reasonable engineering decision with real payoff in operational simplicity.<\/p>\n<hr \/>\n<h2>Where Kubernetes Actually Earns Back Its Complexity Tax<\/h2>\n<p>I did eventually move part of the system to Kubernetes. Not all of it \u2014 more on that in a moment \u2014 but the inference serving component specifically, because we started getting requests for GPU-backed endpoints and that&#8217;s where Compose genuinely hits a wall.<\/p>\n<p>Running GPU workloads across multiple nodes is one of those things K8s is legitimately built for. The NVIDIA GPU Operator on K8s 1.35 has become much more stable than it was back <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/08\/rag-deep-dive-chunking-strategies-vector-databases\/\" title=\"in the\">in the<\/a> 1.28 era \u2014 I remember hitting a specific issue where device plugin pods would crash on node drain (somewhere around kubernetes\/kubernetes#118506, I&#8217;d have to dig). By 1.33 that class of issue was mostly sorted. GPU scheduling on multi-node K8s is now a solved problem in a way it genuinely wasn&#8217;t <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/05\/advanced-prompt-engineering-techniques-chain-of-th\/\" title=\"Two Years\">two years<\/a> ago.<\/p>\n<p>The second payoff: HorizontalPodAutoscaler against custom metrics. We pipe inference latency from Prometheus into KEDA, and the autoscaler responds to queue depth and p95 latency \u2014 not just CPU. That&#8217;s not something you replicate with Compose without building significant custom tooling.<\/p>\n<pre><code class=\"language-yaml\"># hpa.yaml \u2014 scales inference pods on queue depth + latency\napiVersion: keda.sh\/v1alpha1\nkind: ScaledObject\nmetadata:\n  name: inference-scaler\nspec:\n  scaleTargetRef:\n    name: inference-deployment\n  minReplicaCount: 2\n  maxReplicaCount: 20\n  triggers:\n    - type: prometheus\n      metadata:\n        serverAddress: http:\/\/prometheus:9090\n        metricName: inference_queue_depth\n        threshold: &quot;15&quot;  # scale up when &gt;15 items queued per pod\n        query: sum(inference_queue_depth) \/ count(kube_pod_info{pod=~&quot;inference.*&quot;})\n    - type: prometheus\n      metadata:\n        serverAddress: http:\/\/prometheus:9090\n        metricName: inference_p95_latency_ms\n        threshold: &quot;800&quot;\n        query: histogram_quantile(0.95, rate(inference_duration_bucket[2m])) * 1000\n<\/code><\/pre>\n<p>Rolling deployments are the other thing worth mentioning. With Compose, <code>docker compose up --force-recreate<\/code> on a single host means downtime \u2014 or you&#8217;re writing your own health-check loop. K8s rolling updates with a proper <code>readinessProbe<\/code> mean zero-downtime deploys without having to think about it. I pushed a model update on a Friday afternoon once (yes, I know) and the rollout was fine because the cluster waited for new pods to be healthy before draining the old ones. I would not have taken that risk with Compose on a single host.<\/p>\n<p>That said \u2014 and I want to be direct about this \u2014 the K8s cluster costs us roughly $340\/month more than a comparable Compose deployment on a single large instance would. That&#8217;s real money for a side project or an early-stage product. The break-even only works if you&#8217;re at a scale where the autoscaling savings outweigh the base cluster cost, or if you genuinely need multi-node availability.<\/p>\n<hr \/>\n<h2>The ML Workload Angle I Didn&#8217;t Anticipate<\/h2>\n<p>I thought I&#8217;d have a clear answer here. I didn&#8217;t.<\/p>\n<p>I assumed moving ML inference to K8s would also mean moving training jobs there. Same cluster, same GPU nodes, everything in one place \u2014 seemed logical. <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/08\/github-copilot-alternatives-in-2026-cursor-codeium\/\" title=\"What I Actually\">What I actually<\/a> found was that training jobs are weird. They&#8217;re batch, they&#8217;re stateful in an awkward way, they need specific environment setup that changes frequently, and the feedback loop when something goes wrong is slow.<\/p>\n<p>I ran training jobs as K8s Jobs with <code>ttlSecondsAfterFinished<\/code> for a few months. Fine in theory. In practice, every time an ML researcher wanted to tweak the data pipeline or swap a tokenizer, they were waiting on me to update a ConfigMap or rebuild an image. I had become a gatekeeper for changes that had nothing to do with infrastructure \u2014 which is a bad sign.<\/p>\n<p>So I moved training back to Compose \u2014 on a dedicated GPU box, not the K8s cluster. Training runs as <code>docker compose -f compose.train.yml up<\/code> with the model checkpoint directory mounted as a volume. Researchers can modify it directly. Inference serving stays on K8s where the availability and scaling story matters.<\/p>\n<p>I genuinely didn&#8217;t see that split coming. I thought &#8220;K8s for ML&#8221; was the obvious move. The reality: K8s is great for serving (stateless, latency-sensitive, scaling matters) and overkill for training (stateful, batch, where iteration speed matters more than orchestration).<\/p>\n<hr \/>\n<h2>The Signals I Now <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/08\/github-copilot-alternatives-in-2026-cursor-codeium\/\" title=\"Actually Use\">Actually Use<\/a> to Decide<\/h2>\n<p>After 18 <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/09\/deno-20-in-production-2026-migration-from-nodejs-a\/\" title=\"Months of\">months of<\/a> this, the heuristic I&#8217;ve landed on is less about features and more about team and workload shape.<\/p>\n<p>Compose is the right call when your service runs on one host without strain, your team has fewer than six or seven engineers touching infrastructure, and you&#8217;re iterating fast enough that deployment simplicity directly affects development speed. Also \u2014 and I feel strongly about this \u2014 if the people running the service are primarily not infrastructure engineers, Compose&#8217;s operational model is far more forgiving. A <code>docker compose logs -f worker<\/code> is something anyone can run. A <code>kubectl logs -n production -l app=worker --since=1h<\/code> is a command you need to look up, at least at first.<\/p>\n<p>Kubernetes <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/09\/webassembly-in-2026-where-it-actually-makes-sense\/\" title=\"Makes Sense\">makes sense<\/a> when you need to schedule across multiple nodes (GPUs, memory isolation, availability zones), when you have autoscaling requirements that respond to custom signals, when your team has dedicated platform or SRE capacity to own the cluster, or when your availability requirements are strict enough that single-host failure isn&#8217;t acceptable.<\/p>\n<p>One thing I want to push back on: the idea that Kubernetes is automatically &#8220;more production-ready.&#8221; I&#8217;ve seen Compose deployments that were stable and well-monitored, and K8s clusters that were a disaster of misconfigured RBAC, stale CRDs, and nobody who actually understood the control plane. The tool doesn&#8217;t make you production-ready. The operational discipline does.<\/p>\n<hr \/>\n<h2>What I&#8217;d Actually <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/09\/setting-up-github-actions-for-python-applications\/\" title=\"Tell You\">Tell You<\/a> to Do<\/h2>\n<p>Start with Compose. Not because K8s is bad \u2014 it isn&#8217;t \u2014 but because you&#8217;ll hit the limits of Compose in very specific, recognizable ways. You&#8217;ll know when you need multi-node scheduling because you&#8217;ll be staring at a GPU allocation problem that Compose can&#8217;t solve. You&#8217;ll know when you need cluster-level autoscaling because you&#8217;ll have just manually scaled your single host twice in a week and you&#8217;re annoyed about it.<\/p>\n<p>When you hit those specific walls, migrate that specific component. Not everything at once.<\/p>\n<p>The worst outcome I&#8217;ve seen is teams migrating entirely to K8s before they have the scale to justify it, then spending their first <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/09\/deno-20-in-production-2026-migration-from-nodejs-a\/\" title=\"Six Months of\">six months of<\/a> product development fighting cluster configuration instead of shipping features. Kubernetes is powerful and I use it every day, but complexity has a real cost and that cost lands on your team&#8217;s velocity.<\/p>\n<p>Anyway. The sticky note <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/08\/rag-deep-dive-chunking-strategies-vector-databases\/\" title=\"in the\">in the<\/a> README \u2014 I never did &#8220;move everything to Kubernetes.&#8221; I moved the inference serving layer and kept the rest on Compose. The system is faster, more reliable, and cheaper to operate than a full K8s migration would have been. Sometimes the boring answer is the right one.<\/p>\n<p><!-- Reviewed: 2026-03-10 | Status: ready_to_publish | Changes: replaced AI-ish openers (\"Here is the thing\/where\/the actual heuristic\"), removed \"Practical takeaway:\" label, rewrote parallel heuristic blocks as prose, contracted language throughout, varied paragraph lengths, trimmed redundant phrasing in ML section --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Eighteen months ago I inherited a mess.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[1],"tags":[],"class_list":["post-408","post","type-post","status-publish","format-standard","hentry","category-general"],"_links":{"self":[{"href":"https:\/\/blog.rebalai.com\/en\/wp-json\/wp\/v2\/posts\/408","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.rebalai.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.rebalai.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.rebalai.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.rebalai.com\/en\/wp-json\/wp\/v2\/comments?post=408"}],"version-history":[{"count":7,"href":"https:\/\/blog.rebalai.com\/en\/wp-json\/wp\/v2\/posts\/408\/revisions"}],"predecessor-version":[{"id":551,"href":"https:\/\/blog.rebalai.com\/en\/wp-json\/wp\/v2\/posts\/408\/revisions\/551"}],"wp:attachment":[{"href":"https:\/\/blog.rebalai.com\/en\/wp-json\/wp\/v2\/media?parent=408"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.rebalai.com\/en\/wp-json\/wp\/v2\/categories?post=408"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.rebalai.com\/en\/wp-json\/wp\/v2\/tags?post=408"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}