John

Senior Cloud Engineer & Technical Lead

Mirroring Public Container Images to ECR with Regsync

I’ve been burned by public registries before. There’s nothing quite like watching a production Kubernetes deployment fail because Docker Hub is rate limiting your cluster, or worse, experiencing an outage at the exact moment you need to scale up. After one too many incidents where my deployments were at the mercy of systems I couldn’t control, I decided to take a different approach.

The Problem

In enterprise environments, relying on public container registries for production workloads is a risk that’s easy to overlook until it bites you. The issues are predictable but often ignored:

Availability Dependencies: When you deploy a pod in Kubernetes and the image needs to be pulled from Docker Hub, you’re now dependent on Docker Hub being available. If it’s down, your pod doesn’t start. Your autoscaling doesn’t work. Your disaster recovery fails at the worst possible moment.

Rate Limits: Docker Hub introduced rate limits, and suddenly clusters that were pulling images frequently started hitting walls. Anonymous pulls are limited to 100 per 6 hours, and even authenticated free accounts only get 200. In a large cluster with many nodes, that limit gets exhausted faster than you’d expect.

Airgapped Environments: Many enterprise and government environments operate in airgapped or restricted networks. These environments simply can’t reach public registries, making local mirrors a hard requirement rather than a nice-to-have.

Compliance and Security: Having a known, controlled set of images that have been scanned and approved is a security requirement for many organizations. Pulling directly from public registries means you’re trusting whatever is there at pull time.

The Solution: Regsync with ECR

The answer is to mirror public images to a private registry you control. AWS ECR is a natural choice if you’re already in the AWS ecosystem. While AWS offers pull-through cache as a managed solution, I found that regsync gives me more control over exactly what gets mirrored and when.

The approach I settled on has three components:

  1. A master list of images that need to be mirrored
  2. Terraform to pre-create ECR repositories with appropriate lifecycle policies
  3. A pipeline that runs regsync to sync source images to their ECR targets

Component 1: The Master Image List

I maintain a YAML file that defines every image we need mirrored. This becomes the single source of truth:

# images.yaml
images:
  - source: docker.io/library/nginx
    tags: ["1.25", "1.24", "latest"]
  - source: docker.io/library/redis
    tags: ["7.2", "7.0", "6.2"]
  - source: quay.io/prometheus/prometheus
    tags: ["v2.48.0", "v2.47.0"]
  - source: gcr.io/distroless/static-debian12
    tags: ["nonroot", "latest"]
  - source: public.ecr.aws/docker/library/postgres
    tags: ["16", "15", "14"]

Component 2: Terraform for ECR Repositories

Before syncing, the ECR repositories need to exist. Terraform handles this, creating repositories from the master list with consistent lifecycle policies:

variable "mirrored_images" {
  description = "List of images to mirror"
  type = list(object({
    source = string
    tags   = list(string)
  }))
}

locals {
  # Convert source paths to ECR repository names
  # docker.io/library/nginx -> mirror/docker-hub/nginx
  ecr_repos = { for img in var.mirrored_images :
    replace(replace(img.source, "docker.io/library/", "mirror/docker-hub/"),
            "quay.io/", "mirror/quay/") => img
  }
}

resource "aws_ecr_repository" "mirror" {
  for_each = local.ecr_repos

  name                 = each.key
  image_tag_mutability = "IMMUTABLE"  # Prevent tag overwrites

  image_scanning_configuration {
    scan_on_push = true
  }

  encryption_configuration {
    encryption_type = "KMS"
  }

  tags = {
    Purpose   = "image-mirror"
    ManagedBy = "terraform"
  }
}

resource "aws_ecr_lifecycle_policy" "mirror" {
  for_each   = aws_ecr_repository.mirror
  repository = each.value.name

  policy = jsonencode({
    rules = [
      {
        rulePriority = 1
        description  = "Keep last 10 images per tag prefix"
        selection = {
          tagStatus     = "tagged"
          tagPrefixList = ["v", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
          countType     = "imageCountMoreThan"
          countNumber   = 10
        }
        action = {
          type = "expire"
        }
      },
      {
        rulePriority = 2
        description  = "Remove untagged images after 7 days"
        selection = {
          tagStatus   = "untagged"
          countType   = "sinceImagePushed"
          countUnit   = "days"
          countNumber = 7
        }
        action = {
          type = "expire"
        }
      }
    ]
  })
}

Component 3: Regsync Pipeline

With repositories created, regsync handles the actual synchronization. I generate the regsync config from the same master list:

# regsync.yaml
version: 1
creds:
  - registry: docker.io
    user: ""
    pass: ""
  - registry: ".dkr.ecr..amazonaws.com"
    credHelper: ecr-login

defaults:
  rateLimit:
    min: 100ms
  parallel: 4

sync:
  - source: docker.io/library/nginx
    target: ".dkr.ecr..amazonaws.com/mirror/docker-hub/nginx"
    type: repository
    tags:
      allow:
        - "1.25"
        - "1.24"
        - "latest"

  - source: quay.io/prometheus/prometheus
    target: ".dkr.ecr..amazonaws.com/mirror/quay/prometheus/prometheus"
    type: repository
    tags:
      allow:
        - "v2.48.0"
        - "v2.47.0"

The pipeline itself runs on a schedule:

# .github/workflows/image-sync.yaml
name: Sync Container Images

on:
  schedule:
    - cron: '0 */6 * * *'  # Every 6 hours
  workflow_dispatch:

jobs:
  sync:
    runs-on: self-hosted
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::$:role/image-sync-role
          aws-region: us-east-1

      - name: Login to ECR
        uses: aws-actions/amazon-ecr-login@v2

      - name: Install regsync
        run: |
          curl -L https://github.com/regclient/regclient/releases/latest/download/regsync-linux-amd64 -o regsync
          chmod +x regsync

      - name: Run sync
        env:
          DOCKER_USER: $
          DOCKER_TOKEN: $
          AWS_ACCOUNT_ID: $
          AWS_REGION: us-east-1
        run: ./regsync sync --config regsync.yaml

Why Regsync Over Pull-Through Cache

AWS ECR pull-through cache is convenient, but regsync offers advantages for my use case:

Explicit Control: I know exactly what images are in my registry. With pull-through cache, images get cached on first pull, which means you might not have what you need cached when an outage hits.

Pre-warming: Regsync runs on a schedule, ensuring images are always available before they’re needed. Pull-through cache is reactive.

Multi-source Support: Regsync can pull from Docker Hub, Quay, GCR, GitHub Container Registry, and more. Pull-through cache has a more limited set of supported upstream registries.

Tag Filtering: I can explicitly control which tags get mirrored rather than caching everything that gets pulled.

Key Learnings

  • Public registry dependencies are a production risk - Outages and rate limits will eventually affect your deployments
  • A master image list as a single source of truth - Makes both Terraform and regsync configuration consistent and auditable
  • Immutable tags in ECR prevent unexpected changes - Combined with lifecycle policies, you get both stability and cost control
  • Scheduled syncing beats reactive caching - Images are available before you need them, not when you first request them
  • Regsync handles multi-registry sources well - One tool to mirror from Docker Hub, Quay, GCR, and other registries
  • The pipeline approach integrates with existing CI/CD - Standard GitHub Actions workflow that fits into existing patterns
  • Pre-creating ECR repos with Terraform ensures consistency - Lifecycle policies, scanning, and encryption are configured uniformly

After implementing this, I no longer worry about Docker Hub outages during critical deployments. The images my clusters need are already waiting in ECR, scanned, and ready to pull at line speed within AWS. It’s one less external dependency to worry about.