Skip to content

Optimizing ArgoCD at Scale: Helm & HA Tuning, Selective Reconciliation

Loga.dev · · 7 min read

“Running a few GitOps apps is trivial. Running a thousand across clusters without downtime takes deliberate design.”

ArgoCD’s defaults are meant for learning and light workloads. When your platform hosts hundreds or thousands of microservices, those defaults quickly become a bottleneck. This post is the first in our ArgoCD Optimization series. We’ll focus on high-availability (HA) Helm settings and controller concurrency—the foundation for scaling ArgoCD horizontally and vertically. Later posts will cover webhook‑driven refresh, selective sync, rate limiting and jitter.


1. Turn on HA mode, don’t just talk about it

ArgoCD ships a non‑HA deployment by default. For mission‑critical platforms, replicate its components and run Redis in sentinel mode. In the Helm chart, enable redis-ha and bump replicas:

redis-ha:
  enabled: true     # run redis in sentinel mode
controller:
  replicas: 1       # statefulset; scale horizontally as needed
server:
  replicas: 2       # stateless API; run at least two pods
repoServer:
  replicas: 2       # manifest generation; scale up for large repos
applicationSet:
  replicas: 2       # if you use ApplicationSets

You need at least three worker nodes because Redis sentinel requires pods on separate nodes.

In production, the application controller is your bottleneck. It reconciles app state and performs sync operations via two work queues—status (milliseconds) and operations (seconds). By default there are 20 status processors and 10 operation processors. That’s fine for a handful of apps; it fails miserably at 1000+.

  • Increase status processors to ~50 and operation processors to ~25 for roughly 1000 applications. Adjust proportionally as you scale.
  • If manifest generation stalls, increase --repo-server-timeout-seconds and scale the repo-server so it can handle parallel renders.
  • Limit concurrent kubectl fork/exec calls via --kubectl-parallelism-limit to avoid out-of-memory kills.

These flags can be passed via Helm’s controller.extraArgs or by patching the StatefulSet. Use Kubernetes HorizontalPodAutoscaler if your cluster supports HPA on StatefulSets; ArgoCD’s work queues expose Prometheus metrics like argocd_app_reconcile and argocd_app_k8s_request_total—use them to size resources and alert on backlogs.


2. Concurrency isn’t just CPU—allow parallel renders safely

When multiple Helm apps share the same directory, ArgoCD serializes manifest generation to avoid race conditions. This can cripple large monorepos. Create a .argocd-allow-concurrency file in your Helm chart or Kustomize directory to tell ArgoCD that manifest rendering has no side effects. Removing side effects (like writing temp files) lets the repo-server generate manifests in parallel.

For Helm charts with conditional dependencies, keep dependencies flat or template them ahead of time. The goal is simple: avoid touching the Git working tree during rendering so ArgoCD can process multiple applications concurrently.


3. Adjust reconciliation interval—webhooks will do the rest

By default, the controller polls Git every three minutes to detect changes. At small scale that’s convenient. At large scale it causes constant churn—even if nothing has changed—and still leaves a three‑minute lag for deployments.

Use two levers:

  1. Increase the poll interval by setting timeout.reconciliation in the argocd-cm ConfigMap. Use a duration string like 10m or 1h. Longer intervals reduce baseline load on your repo-server and controller.
  2. Configure Git webhooks so ArgoCD receives push notifications and refreshes apps instantly. ArgoCD polls Git every three minutes by default but will bypass the delay when a webhook event arrives. For GitHub, GitLab and other providers, point the webhook to https://argocd.example.com/api/webhook and (optionally) set a shared secret. Tune webhook.maxPayloadSizeMB in argocd-cm to avoid DDoS via oversized payloads.

With webhooks in place, you can safely increase timeout.reconciliation to minutes or even hours. ArgoCD will still refresh promptly on every commit but won’t waste resources polling Git.


4. Selective sync: only refresh what changed

Large mono-repos suffer when every commit invalidates the manifest cache. ArgoCD caches manifests per commit SHA; a new commit invalidates the cache for all apps. That means thousands of apps may reconcile even if only one was changed.

Add the argocd.argoproj.io/manifest-generate-paths annotation to each application to tell ArgoCD which directories matter. During a webhook event, ArgoCD compares the changed files from the payload with the paths in this annotation; if nothing matches, it skips reconciliation.

Example:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: orders-service
  annotations:
    # Only refresh if files under 'services/orders' or 'shared' change
    argocd.argoproj.io/manifest-generate-paths: .;../shared
spec:
  source:
    repoURL: https://github.com/your-org/platform.git
    targetRevision: main
    path: services/orders

This pattern dramatically reduces unnecessary refreshes and keeps the repo-server cache hot. Use absolute paths starting with / for repo root and semicolon-separated lists for multiple paths.

Manifest path annotation support only works with GitHub, GitLab and Gogs webhooks. Other providers will ignore it, so weigh your repo layout accordingly.


What’s next

This post laid the groundwork: HA deployment, controller concurrency, longer poll intervals with webhooks, and selective sync via annotations. Together, these changes ensure ArgoCD doesn’t collapse under the weight of your platform. In the next installment we’ll dive deeper into rate limiting, reconciliation jitter and ApplicationSet sharding to further smooth out large-scale operations.

Stay tuned—and remember that optimization is a journey, not a checkbox.