Keep Cloudflare Containers Warm: Eliminate Cold Starts with a Healthcheck Endpoint

Fred· AI Engineer & Developer Educator11 min read

Last tested: December 2025 | Platform: Cloudflare Workers Containers | Languages: Node.js, Python

Cloudflare Workers Containers are impressive. You get a full container runtime—any language, any framework—replicated and distributed across Cloudflare's global network. Once warm, they respond with almost no latency. The problem is getting them warm in the first place.

If you're running anything heavier than a basic web server, cold starts can hit 3-5 seconds or more. That's an eternity for your first user of the day, or after any quiet period. Unlike standard Workers which spin up in milliseconds, containers need to pull images, start runtimes, and initialize your application. There's no magic bullet to eliminate this entirely, but you can make cold starts rare enough that most users never experience them.

The solution is the same pattern that's worked for AWS Lambda and GCP Cloud Functions for years: keep the container warm with scheduled pings.

What Cold Starts Actually Cost You

When a Cloudflare Container goes cold, the next request triggers a cascade of operations. The container image needs to be pulled (if not cached nearby). The runtime starts and your application boots. Any initialization code runs—database connections open, config files parse, models load into memory. Only then does your actual request handler run.

For a simple Node.js server, this might add 1-2 seconds. For a Python application with heavy dependencies like image processing libraries, you can easily hit 5 seconds or more. One user on Reddit reported seeing "up to 5 seconds before any upload packets are sent" on their image processor container—and that was on one of the larger container sizes.

The first-byte delay isn't just annoying. It kills perceived performance. Users abandon pages that don't respond within 3 seconds. APIs that timeout look broken. Your container might be incredibly fast once running, but none of that matters if the cold start already drove users away.

The Healthcheck Endpoint Pattern

The fix is straightforward: add a lightweight endpoint that your container can respond to instantly, then ping it on a schedule from outside. This keeps at least one container instance warm and ready for real traffic.

The critical rule for your healthcheck endpoint is that it must not initialize heavy dependencies. No image libraries, no database connections, no model loading, no cache warming. It should return immediately with minimal work—ideally just a 204 No Content response. If your healthcheck triggers the same initialization as your real endpoints, you've defeated the entire purpose.

Think of it as a separate code path that proves the container is running without actually doing anything useful. Real requests get the full initialization. Healthcheck requests get a fast "yes I'm alive" and nothing more.

Node.js Implementation

Here's a complete Node.js server with a proper healthcheck endpoint. Notice how the heavy initialization is guarded behind a lazy-loading flag that only triggers on real requests.

import http from 'node:http';

const PORT = process.env.PORT || 8080;

// Heavy dependencies stay uninitialized until needed
let sharp;
let pipelineReady = false;

async function initializePipeline() {
  if (pipelineReady) return;

  // This is the slow part - only do it for real requests
  sharp = (await import('sharp')).default;
  // Any other expensive setup goes here
  pipelineReady = true;
  console.log('Pipeline initialized');
}

const server = http.createServer(async (req, res) => {
  // Healthcheck: return immediately, skip all initialization
  if (req.method === 'GET' && req.url === '/healthcheck') {
    res.writeHead(204, {
      'cache-control': 'no-store',
      'content-type': 'text/plain'
    });
    return res.end();
  }

  // Real request: initialize dependencies if needed
  await initializePipeline();

  // Your actual request handling goes here
  if (req.method === 'POST' && req.url === '/resize') {
    // Process image with sharp...
    res.writeHead(200, { 'content-type': 'application/json' });
    return res.end(JSON.stringify({ success: true }));
  }

  res.writeHead(404, { 'content-type': 'text/plain' });
  res.end('Not found');
});

server.listen(PORT, () => {
  // Don't do heavy work here - this runs on cold start
  console.log(`Server listening on port ${PORT}`);
});

The healthcheck path returns a 204 response in microseconds. It doesn't touch the sharp import or any other expensive dependencies. When a real request comes in on /resize, that's when initializePipeline() runs for the first time.

If you're using Express, the pattern is the same but with Express routing:

import express from 'express';

const app = express();
const PORT = process.env.PORT || 8080;

let pipelineReady = false;

async function ensurePipeline(req, res, next) {
  if (req.path === '/healthcheck') return next();

  if (!pipelineReady) {
    // Lazy-load heavy dependencies
    const sharp = (await import('sharp')).default;
    app.locals.sharp = sharp;
    pipelineReady = true;
  }
  next();
}

app.use(ensurePipeline);

// Healthcheck bypasses all middleware except the check above
app.get('/healthcheck', (req, res) => res.status(204).end());

// Real endpoints get the initialized pipeline
app.post('/resize', async (req, res) => {
  const sharp = req.app.locals.sharp;
  // Process with sharp...
  res.json({ success: true });
});

app.listen(PORT, () => console.log(`Listening on ${PORT}`));

The middleware ensurePipeline checks if we're hitting healthcheck and skips initialization if so. Every other route gets the full dependency load. This keeps your healthcheck fast while ensuring real requests have everything they need.

Python Implementation

The same pattern works in Python. Here's a FastAPI implementation with lazy loading:

from fastapi import FastAPI, Response
from functools import lru_cache
import os

app = FastAPI()

# Heavy imports stay at module level but don't execute
# The actual loading happens in get_processor()
_processor = None

def get_processor():
    global _processor
    if _processor is None:
        # This is the slow part - heavy imports happen here
        from PIL import Image
        import numpy as np
        _processor = {'pil': Image, 'np': np}
        print('Processor initialized')
    return _processor

@app.get('/healthcheck')
def healthcheck():
    # Return immediately - no initialization
    return Response(status_code=204)

@app.post('/resize')
async def resize():
    # Real request: initialize if needed
    proc = get_processor()
    # Use proc['pil'] and proc['np'] for image processing...
    return {'success': True}

if __name__ == '__main__':
    import uvicorn
    port = int(os.environ.get('PORT', 8080))
    uvicorn.run(app, host='0.0.0.0', port=port)

The get_processor() function uses a global flag to ensure expensive imports only happen once, and only when a real endpoint calls it. The healthcheck endpoint never triggers this initialization—it just returns 204 and exits.

For Flask, the approach is identical:

from flask import Flask, Response
import os

app = Flask(__name__)
_processor = None

def get_processor():
    global _processor
    if _processor is None:
        from PIL import Image
        import numpy as np
        _processor = {'pil': Image, 'np': np}
    return _processor

@app.get('/healthcheck')
def healthcheck():
    return Response(status=204)

@app.post('/resize')
def resize():
    proc = get_processor()
    return {'success': True}

if __name__ == '__main__':
    port = int(os.environ.get('PORT', 8080))
    app.run(host='0.0.0.0', port=port)

Run with Gunicorn in production: gunicorn -b 0.0.0.0:${PORT:-8080} app:app

The Cron Worker: Keeping Things Warm

Your container has a healthcheck endpoint. Now you need something to ping it on a schedule. The cleanest solution is a Cloudflare Worker with a Cron Trigger—it stays entirely within Cloudflare's ecosystem and costs nothing on the free tier.

Create a new Worker project for your warmer:

mkdir container-warmer && cd container-warmer
npm init -y

Create wrangler.toml with your cron schedule:

name = "container-warmer"
main = "src/index.ts"
compatibility_date = "2025-12-01"

[triggers]
crons = ["*/5 * * * *"]

The */5 * * * * expression runs every 5 minutes. Adjust based on how warm you need the container—more frequent pings mean warmer containers but more Worker invocations.

Create src/index.ts with the scheduled handler:

export interface Env {
  CONTAINER_URL: string;
  AUTH_TOKEN?: string;
}

export default {
  async scheduled(event: ScheduledEvent, env: Env, ctx: ExecutionContext) {
    ctx.waitUntil(pingContainer(env));
  }
};

async function pingContainer(env: Env) {
  const headers: Record<string, string> = {
    'user-agent': 'cloudflare-container-warmer/1.0'
  };

  // Optional: protect your healthcheck with a bearer token
  if (env.AUTH_TOKEN) {
    headers['authorization'] = `Bearer ${env.AUTH_TOKEN}`;
  }

  try {
    const response = await fetch(env.CONTAINER_URL, {
      method: 'GET',
      headers
    });

    if (response.ok || response.status === 204) {
      console.log('Container warmed successfully');
    } else {
      console.log(`Warm ping returned status: ${response.status}`);
    }
  } catch (error) {
    console.log(`Warm ping failed: ${error}`);
  }
}

Set your environment variables and deploy:

# Set the container's healthcheck URL
npx wrangler secret put CONTAINER_URL
# Enter: https://your-container.your-subdomain.workers.dev/healthcheck

# Optional: set an auth token
npx wrangler secret put AUTH_TOKEN

# Deploy the warmer
npx wrangler deploy

The Worker will now ping your container every 5 minutes, keeping at least one instance warm and ready for traffic.

Protecting Your Healthcheck

Without protection, anyone can hit your healthcheck endpoint and keep your container running—which you pay for. If you want to lock it down, add a simple bearer token check:

// In your container's healthcheck handler
app.get('/healthcheck', (req, res) => {
  const auth = req.headers['authorization'];
  const expected = `Bearer ${process.env.HEALTHCHECK_TOKEN}`;

  if (process.env.HEALTHCHECK_TOKEN && auth !== expected) {
    return res.status(401).end();
  }

  res.status(204).end();
});

Set the same token in both your container and your warmer Worker. External requests without the token get rejected, but your scheduled pings get through.

What Warming Cannot Guarantee

Scheduled pings reduce cold starts significantly, but they don't eliminate them completely. Cloudflare can still evict your container instance based on resource pressure or their own scaling logic. If traffic spikes and multiple instances spin up, only the ones receiving your warm pings stay hot—new instances still cold-start.

There's also no hard SLA for "always warm." Cloudflare optimizes for their global infrastructure, not your specific container's uptime. Think of warming as a best-effort strategy that works most of the time, not a guarantee.

That said, in practice, a 5-minute ping interval keeps containers warm enough that cold starts become rare events rather than common ones. Most of your users will never experience them.

The right ping interval depends on your tolerance for cold starts versus cost:

Every 2-3 minutes: Very warm. Cold starts become extremely rare. Higher Worker invocation count.

Every 5 minutes: The sweet spot for most applications. Container stays warm enough for consistent performance without excessive pings.

Every 10-15 minutes: Mostly warm. You'll see occasional cold starts during quiet periods, but still far better than no warming at all.

Every 30+ minutes: Minimal warming. Good if you just want to prevent the container from being evicted entirely, but cold starts will still happen regularly.

Start at 5 minutes and adjust based on what you see in your logs. If users report slow first loads, shorten the interval. If your container is handling steady traffic anyway, you might not need warming at all.

Additional Performance Tactics

Warming helps, but you can reduce cold start pain even further with these optimizations:

Shrink your container image. Smaller images pull faster. Use slim base images, remove dev dependencies, and clean up any files you don't need at runtime. Every megabyte matters.

Move initialization out of startup. Don't load models, open database connections, or parse large config files in your container's entrypoint. Defer that work to the first real request, or better yet, to specific endpoints that need it.

Preload only what's essential. If you need to accept uploads, make sure your upload handler can start receiving bytes before your image processing pipeline is ready. Acknowledge the request first, process later.

Consider container size. Cloudflare offers different container classes. Larger containers have more resources but may take slightly longer to provision. Test whether your specific workload benefits from upsizing.

The goal is to make your container as lightweight as possible at startup, even if that means lazy-loading some functionality.

Putting It All Together

Here's the complete workflow:

  1. Add a /healthcheck endpoint to your container that returns 204 immediately without initializing heavy dependencies
  2. Deploy a Worker with a Cron Trigger that pings your healthcheck every 5 minutes
  3. Optionally protect the healthcheck with a bearer token so only your warmer can access it
  4. Monitor your container logs to verify pings are working and cold starts are decreasing

The implementation takes about 20 minutes. The payoff is a container that feels fast for every user, not just the ones who happen to hit it while it's already warm.

Cloudflare Containers aren't cheap compared to standard Workers, but they let you run anything—any language, any framework, any runtime. Keeping them warm ensures you actually get the performance you're paying for.

Frequently Asked Questions

Why not just use Cloudflare Workers instead of Containers?

Workers are faster and cheaper but limited to JavaScript/TypeScript with specific APIs. Containers let you run Python, Go, Rust, or any language in a full Linux environment. If your workload requires libraries or runtimes that Workers don't support, Containers are your only option on Cloudflare's edge.

Does this work for multiple container instances?

Your warmer only pings one endpoint, which keeps one instance warm. If Cloudflare scales up to multiple instances under load, the new ones will cold-start. For most applications this is fine—steady traffic keeps additional instances warm naturally.

How much does the warmer Worker cost?

The warmer is a standard Worker, covered by the free tier (100,000 requests/day). Running every 5 minutes uses about 8,640 invocations per month—well within free limits.

Can I use an external uptime monitor instead?

Yes. Services like UptimeRobot, Better Uptime, or Pingdom can hit your healthcheck endpoint on a schedule. The downside is adding an external dependency when a Worker Cron Trigger does the same thing natively.


Related Guides:

Fred

Fred

AUTHOR

Full-stack developer with 10+ years building production applications. I've been deploying to Cloudflare's edge network since Workers launched in 2017.