NGINX Architecture: How a Web Server Handles Hundreds of Thousands of Connections Without Breaking a Sweat

NGINX Internal Architecture

Your Apache server doesn't fall over because your code is bad. It falls over because every incoming connection owns a thread, and threads are expensive — and at some point, you simply run out. NGINX was designed from day one to make that problem disappear.

Architecture Overview

NGINX is built around a fundamentally different mental model than the web servers that came before it. Where Apache spawns a process or thread per connection, NGINX treats connections as events — discrete notifications that something is ready to be processed. This shift from a resource-per-connection model to an event-driven, non-blocking I/O model is not a tuning tweak; it is the entire architecture.

NGINX has a master process that performs privileged operations such as reading configuration and binding to ports, plus a set of worker and helper processes. The workers are where all connection handling, request processing, and upstream communication actually happen. Each worker is single-threaded and runs an internal event loop — but that single thread can hold thousands of open connections simultaneously because it never blocks waiting on any one of them.

Instead of blocking on a connection and waiting for a response, NGINX is free to continue processing other events in the queue. This is not concurrency through parallelism; it is concurrency through multiplexing. The operating system's own event notification primitives — epoll on Linux, kqueue on BSD and macOS — do the heavy lifting of watching thousands of file descriptors and notifying NGINX the moment a socket has data ready.

The dominant pattern here is event-driven, asynchronous, non-blocking I/O, structured around a master-worker process model. The architecture is deliberately simple at the top and highly optimised at the bottom.

Component Breakdown

Master Process

The master process, running with root privileges, serves as the orchestrator of the entire system — responsible for reading and validating configuration, managing worker processes, and handling signals. It never touches a single HTTP request. When a reload signal (SIGHUP) is received, the master forks new workers using the updated configuration while the old workers finish draining their existing connections, then exit cleanly. Zero downtime, zero dropped connections. Remove the master process, and you lose the ability to reload configuration, manage worker lifecycle, or perform binary upgrades in place.

Worker Processes

Worker processes do all the work — handling network connections, reading and writing content to disk, and communicating with upstream servers. The NGINX configuration recommends one worker process per CPU core to make the best use of hardware resources. Each worker runs its own independent event loop, maintains its own connection pool, and requires no locking or coordination with other workers for standard request processing. If a worker crashes, only its in-flight connections are affected — the master respawns it immediately, and other workers continue uninterrupted.

Event Loop and I/O Multiplexing

This is NGINX's core mechanism. Each NGINX worker is single-threaded and runs an event loop. Internally, it uses epoll or kqueue and registers sockets. The OS uses a kernel data structure to track those sockets, and once data is available, the OS notifies the worker. The worker never sits idle waiting; it processes whatever is ready and moves on. Without this layer, NGINX degrades to a blocking model indistinguishable from a naive server — the entire performance profile collapses.

Cache Loader and Cache Manager

The cache loader loads cached metadata into memory when NGINX boots up. The cache manager periodically checks the cache directory and frees space by removing expired entries. Both are short-lived or periodic helper processes — they consume minimal resources and exit or sleep when their job is done. If the cache manager is absent, the on-disk cache grows unbounded and eventually fills the disk.

Shared Memory Zones

NGINX uses shared memory zones to share data between worker processes. This is how rate limiting counters, upstream health state, and connection limits stay consistent across all workers without inter-process messaging. Without shared memory zones, each worker would maintain its own isolated view of rate limits — a user could effectively bypass a 10 requests/second limit by hitting NGINX across multiple workers.

Module System

Functional modules in NGINX are divided into event modules, phase handlers, output filters, variable handlers, protocols, upstreams, and load balancers. Phase handlers process the request at specific points in the HTTP pipeline; output filters transform the response on its way back to the client. The module system is compiled in — NGINX does not support runtime module loading in the same way Apache does, which is a deliberate trade-off for performance and stability.

Data Flow Walkthrough

Scenario: A user requests a product page that NGINX serves as a reverse proxy to a Node.js application server, with caching enabled.

The client opens a TCP connection to NGINX on port 443. The OS accepts the connection and notifies the relevant worker via epoll.
The worker's event loop picks up the new connection event. The TLS handshake is performed asynchronously; the worker does not block waiting for it.
Once the TLS session is established and the HTTP request is fully received, NGINX's HTTP state machine parses the request headers and URI.
The location block matching logic evaluates the request URI against the configured location directives. The matching location points to an upstream group with caching enabled.
NGINX checks the proxy cache on disk. It constructs a cache key from the URI and relevant headers, then asks the cache manager whether a valid, non-stale response exists. On a cache hit, the cached response is streamed directly to the client — the upstream Node.js server is never contacted. The worker moves on to the next event.
On a cache miss, the worker opens a non-blocking connection to the upstream Node.js server (using the configured upstream pool) and forwards the request. The worker does not sit and wait — it registers the upstream socket with epoll and resumes other work.
When the upstream responds, epoll fires. The worker reads the response body, writes it to the proxy cache on disk (in the background, non-blocking), and simultaneously streams it to the client.
The response is passed through any configured output filter modules — gzip compression, header modification — before transmission.
If HTTP keep-alive is active, the connection remains open and the worker registers the client socket for the next event. If not, the connection is closed cleanly.

Throughout this entire flow, the worker thread never blocked. It processed dozens of other connections during the microseconds it spent waiting on disk I/O or upstream socket readiness.

Design Decisions and Trade-offs

The most consequential decision in NGINX's design is the rejection of the thread-per-connection model. The common way to design network applications is to assign a thread or process to each connection — simple and easy to implement, but it does not scale when the application needs to handle thousands of simultaneous connections. NGINX's answer was to separate the concept of a connection from the concept of a thread, and bind them only when there is actual work to do.

This comes with a real cost: blocking operations are poison. Any call that blocks the event loop — a synchronous file read, a poorly written third-party module, a long-running Lua script — stalls every other connection that worker is handling. NGINX partially addresses this through a thread pool (aio threads) for disk I/O, but the architecture demands discipline from anyone extending it. This is also why NGINX doesn't natively execute dynamic languages like PHP. It deliberately hands that responsibility off to FastCGI or upstream processes, keeping the event loop clean.

The decision to run one worker per CPU core, each with its own event loop and no shared state beyond shared memory zones, trades inter-process coordination overhead for strict resource isolation. Workers do not lock on connection acceptance by default — enabling SO_REUSEPORT improved peak p99 latency by 33% at Cloudflare by solving uneven load distribution across workers without requiring an accept mutex. The upstream project eventually acknowledged this, and the feature is now standard.

The modular architecture being compile-time rather than runtime is another explicit trade-off. Apache's dynamic module loading is more flexible, but it introduces runtime overhead and increases the attack surface. NGINX's position is that you know your configuration at build time — bake it in, and eliminate the overhead and unpredictability.

Advantages

Massive connection density at low memory cost. Because NGINX does not allocate a thread or process per connection, its memory footprint grows slowly as connections scale. A worker handling 10,000 keep-alive connections uses a fraction of the memory that 10,000 threads would require. This directly improves memory efficiency and makes high-connection-count deployments economically viable on modest hardware.

Deterministic CPU utilization under load. With worker count pinned to CPU core count and no thread context-switching overhead, NGINX's CPU usage under steady traffic is predictable and flat. Spikes manifest as connection queue depth, not runaway CPU. This makes capacity planning and auto-scaling decisions substantially easier — operational predictability is the engineering property.

Zero-downtime configuration reload and binary upgrades. When the master receives a SIGHUP, it forks new workers with the updated configuration. These workers immediately begin accepting connections while old workers gracefully drain and exit. Binary upgrades follow the same pattern with two master processes briefly co-existing. This is operational continuity — a property that most load balancers require significant infrastructure to achieve.

Fault isolation between workers. If a worker crashes due to a bad request, only that worker dies. Other workers keep running. The master respawns the failed worker automatically. No single bad request can take down the entire server, which directly improves fault tolerance for production deployments.

Efficient static content delivery. NGINX's event loop, combined with sendfile() system calls for zero-copy file transmission, makes static file serving extraordinarily fast. The CPU is barely involved — the kernel streams the file directly from the page cache to the network socket. This improves throughput for asset-heavy workloads without custom caching infrastructure.

Composable reverse proxy and load balancing in a single binary. TLS termination, upstream proxying, response caching, rate limiting, health checks, and weighted load balancing are all available without external dependencies. The operational simplicity of not requiring a separate proxy layer alongside the web server is a significant reduction in failure surface.

Limitations and Failure Modes

Blocking operations destroy the event loop. Any third-party module, Lua script, or custom handler that makes a synchronous blocking call will stall the entire worker for the duration of that call. With a 4-worker configuration, four simultaneous blocking operations bring NGINX to a standstill. The aio threads directive partially mitigates disk I/O blocking, but the constraint is architectural — it cannot be fully engineered away without fundamentally changing the model. Teams that embed scripting logic in NGINX (via OpenResty/Lua) must understand this deeply, or they will create latency cliffs under load.

Dynamic content is not NGINX's job — and proxying adds a hop. NGINX cannot execute PHP, Python, Ruby, or Node.js natively. Every request for dynamic content requires a proxy pass to an upstream process (FastCGI, uWSGI, a TCP backend). That adds a network round-trip, a connection pool to manage, and upstream health check complexity. Apache is faster when delivering dynamic content; NGINX is faster when delivering static content and cached responses. Teams that treat NGINX as an application server will be disappointed and confused by performance characteristics that don't match expectations.

Worker imbalance under certain traffic patterns. Without SO_REUSEPORT, the accept mutex used to prevent multiple workers from racing on the same connection can cause uneven load distribution — one worker absorbs a disproportionate share of new connections. This is a known issue with a known fix, but it is not enabled by default in all configurations. The failure mode is not a crash; it is one overloaded worker while others sit mostly idle, producing confusing performance profiles under load testing.

The module ecosystem is limited compared to Apache. Because modules are compiled in, extending NGINX requires rebuilding the binary. Third-party modules vary significantly in quality and maintenance status. NGINX doesn't interpret .htaccess files, so per-directory configuration — common in shared hosting and some CMS deployments — is simply unavailable. Teams migrating from Apache regularly underestimate this migration effort.

Cache poisoning and stale content risks. The proxy cache is powerful but requires careful cache key design. If cache keys are not constructed to account for Vary headers, authentication state, or query parameters, NGINX will serve one user's response to another. The failure is silent — there is no error, just incorrect content delivered confidently.

When to Use This Architecture

Use it when:

Your application serves a large volume of static assets, cached API responses, or media files where NGINX can fulfil requests without touching an upstream
You need TLS termination, request routing, rate limiting, and load balancing in a single, operationally simple layer in front of multiple backend services
Your backend servers are slower than your clients, and you need NGINX's upstream buffering to prevent slow clients from holding backend connections open
You are running microservices and need an API gateway-style ingress with fine-grained routing, header manipulation, and upstream health management without the overhead of a full service mesh

Avoid it when:

Your primary workload is compute-intensive dynamic content generation with minimal static serving — the reverse proxy hop adds complexity without proportional benefit; a well-tuned application server alone may be simpler
Your team needs extensive per-directory runtime configuration (.htaccess-style) that users or CMS software controls — NGINX's configuration model is centralised and static
You are embedding complex application logic in Lua or custom modules without a deep understanding of the event loop — the performance gains NGINX provides can be completely negated and then some

Real-World Adoption

Netflix

Netflix became the first customer of NGINX, Inc. after it incorporated in 2011 and chose NGINX as the heart of its delivery infrastructure, Open Connect — one of the largest CDNs in the world. The problem was delivering high-bitrate video streams to millions of concurrent viewers. NGINX's ability to hold massive numbers of open connections while efficiently streaming cached content made it the right fit for an edge server that needed to maximise bytes-out per dollar of hardware.

Cloudflare

At peak, Cloudflare serves more than 10 million requests per second across its data centres, and over the years made significant modifications to its version of NGINX to handle that growth. Cloudflare's engineering blog documents a series of deep NGINX tuning exercises — from SO_REUSEPORT adoption to thread pool experimentation for cache reads. More recently, Cloudflare replaced its NGINX and LuaJIT-based forward proxy layer with a Rust-based system, reporting that the replacement uses less than half the CPU of the NGINX stack. This is arguably the most important public NGINX case study available: it demonstrates both the ceiling of what NGINX can do at extreme scale and the architectural constraints that eventually force companies beyond it.

Dropbox

The Dropbox engineering team shared their experience architecting their global edge network using a custom stack of NGINX and IPVS, connecting to Dropbox backend servers over their backbone network, with GeoDNS and BGP Anycast ensuring availability and low latency. Reference: Dropbox Engineering Blog, "Dropbox Edge Network."

WordPress.com (Automattic)

Automattic runs WordPress.com — one of the highest-traffic shared hosting platforms on the internet — with NGINX handling HTTP traffic in front of PHP-FPM application servers. The combination of NGINX for static serving and upstream buffering, with PHP-FPM handling dynamic rendering, is the canonical production deployment pattern that the NGINX project itself documents and recommends for high-traffic PHP workloads.

Common Pitfalls and Anti-Patterns

Misconfigured worker count and connection limits together. Setting worker_processes auto is correct, but teams often forget to pair it with an appropriately high worker_connections limit and corresponding OS-level ulimit for open file descriptors. The result is that NGINX silently rejects connections at load — not with an error visible in application logs, but with a brief TCP RST the client sees as a connection failure. Always verify ulimit -n and the worker_rlimit_nofile directive match your worker_processes × worker_connections target.

Using NGINX as an application server by embedding all logic in Lua. OpenResty (NGINX + LuaJIT) is powerful, but teams routinely migrate business logic into Lua handlers that make synchronous external calls — Redis lookups, database queries, external HTTP fetches — without using the cosocket API for non-blocking I/O. A single blocking Lua call inside a worker stalls every connection that worker holds. The pattern feels safe until traffic picks up, at which point latency increases non-linearly and the root cause is opaque.

Skipping upstream health checks in load-balanced configurations. NGINX open-source's upstream module performs passive health checks only — it marks a server as failed after a connection attempt fails. NGINX Plus offers active health checks. Teams running the open-source version that skip configuring max_fails and fail_timeout correctly will find that a failed backend receives traffic until a live request fails against it. The failure is not graceful; real users absorb the error responses while NGINX catches up.

Treating the proxy cache as a CDN without understanding cache key design. A common mistake is enabling proxy_cache without carefully reviewing the proxy_cache_key directive. The default key includes $scheme$proxy_host$request_uri but omits cookies, authentication headers, and Vary response headers. Applications that serve different content based on session state or Accept-Language headers will deliver incorrect cached responses to users if the cache key doesn't capture those dimensions.

Closing Thought

NGINX's event-driven architecture solved a specific problem — the C10K problem — brilliantly and durably. What the last two decades of production deployments have shown is that the architecture's real constraint isn't connections; it's the assumption that request processing is fast and I/O-bound. The moment you push complex compute, blocking calls, or stateful logic into the NGINX layer, the elegance of the event loop becomes a trap. Cloudflare's eventual migration away from NGINX at their scale isn't a verdict against the architecture — it's a data point about where the ceiling sits when you operate at a trillion requests per day. For everyone else, that ceiling is extraordinarily far away, and NGINX remains one of the most well-engineered pieces of infrastructure software ever written.

ArchFoundry