Description
Apache HTTP Server (httpd) has been the backbone of the web since 1995, and understanding why it still powers millions of servers comes down to one design decision: a modular, process/thread-based architecture built around a pluggable runtime called the Multi-Processing Module (MPM).
At its core, Apache follows a request-response pipeline where every incoming HTTP connection is handled by a worker (process or thread, depending on the MPM), and that worker passes the request through a series of hooks — each one delegated to a module. That's the whole system. Everything else — SSL, URL rewriting, authentication, compression, logging — is a module that hooks into that pipeline.
The Main Components
1. Core (httpd core)
The immovable foundation. Handles configuration parsing (httpd.conf), virtual host resolution, request lifecycle orchestration, and the hook/module registration system. It doesn't serve content itself — it just ensures the right modules run in the right order.
2. Multi-Processing Modules (MPMs) This is where Apache's concurrency model lives. Three choices:
- Prefork MPM — spawns a pool of single-threaded child processes. Each process handles one connection at a time. Crash-safe (one bad request kills one process, not all) but memory-heavy. The legacy default.
- Worker MPM — hybrid model. Each child process spawns multiple threads. Lower memory footprint than Prefork. Standard for high-load setups without mpm_event.
- Event MPM — the modern default since Apache 2.4. Decouples connection handling from request processing. A dedicated listener thread per child process handles keep-alive connections, passing active requests to worker threads. This solves the keep-alive problem that plagued Worker.
3. Module System (mod_*)
Apache's superpower. Modules hook into the request lifecycle at specific phases: post_read_request, translate_name, check_access, type_checker, handler, log_transaction, etc. Over 60 official modules exist. Key ones:
mod_ssl— TLS termination via OpenSSLmod_rewrite— URL rewriting engine (regex-based, extremely powerful, notoriously abused)mod_proxy / mod_proxy_http— reverse proxy and load balancingmod_auth_*— authentication (basic, digest, LDAP, etc.)mod_deflate— on-the-fly gzip compressionmod_cache— shared memory and disk-based caching
4. Request Processing Pipeline Every request travels through a well-defined phase sequence:
- Connection accepted by listener (Event MPM) or worker (Prefork/Worker)
- HTTP parsing → request object created
- Virtual host resolution
- URI translation (mod_rewrite, mod_alias)
- Access control checks
- Authentication → Authorization
- MIME type determination
- Handler invoked (mod_php, mod_wsgi, mod_cgi, or core file handler)
- Response filters applied (compression, transformation)
- Response sent, logged
5. Configuration Engine
Apache's .htaccess system allows per-directory configuration overrides at runtime — useful, but expensive. Every request to a directory triggers a filesystem walk to check for .htaccess files unless AllowOverride None is set. This is one of the most common performance mistakes.
6. APR (Apache Portable Runtime) An abstraction layer beneath everything. Handles memory pools, threading, file I/O, and network operations in a cross-platform way. Memory pools in particular are notable — Apache allocates memory per-request from a pool and frees the whole pool at once rather than tracking individual allocations.
Pros
- Mature and battle-tested — decades of production hardening, security patches, and edge-case handling baked in
- mpm_event handles C10K — Event MPM with keep-alive offloading handles high-concurrency workloads without the per-connection thread overhead of Worker/Prefork
- Unmatched module ecosystem — mod_rewrite, mod_security, mod_proxy, mod_auth_ldap, mod_wsgi — whatever you need, it exists, is documented, and has Stack Overflow answers from 2009
- Per-request memory isolation — APR memory pools mean no cross-request memory leaks; each request cleans up after itself
- Virtual host flexibility — name-based, IP-based, and port-based virtual hosting with granular per-vhost config
- Gradual migration path — run as a reverse proxy in front of newer stacks (Node, Python) without replacing your existing setup overnight
.htaccessfor delegated config — lets application developers control routing/security without touching the main server config (double-edged, but genuinely useful in shared hosting)
Cons
- Memory per-process (Prefork) — each child process loads the full Apache binary + all modules into memory. Serving 200 concurrent connections with Prefork can consume gigabytes
- Blocking I/O model (Prefork/Worker) — one thread/process per active request means thread exhaustion under slow clients or high concurrency, even with Event MPM for keep-alives
.htaccessperformance tax — unless explicitly disabled, every request walks the filesystem checking for override files. Death by a thousand syscalls on busy servers- Module conflicts are opaque — two modules hooking the same phase with conflicting logic fail silently or produce bizarre behavior that's extremely difficult to debug
- Config syntax is hostile —
httpd.confis declarative but not composable. Large configs become unmaintainable. There's no native include-by-convention, no variable interpolation beyondDefine, no real templating - Not async-native — unlike NGINX's event loop or Node's non-blocking I/O, Apache's architecture was not designed for async. Event MPM is a patch on a fundamentally synchronous model
- Dynamic module loading costs — loading shared objects (
.sofiles) at startup means startup time grows with module count, and misconfigured dynamic modules crash the whole server on boot
Real Usage Scenarios
1. Shared Web Hosting (Classic Use Case)
This is where Apache remains dominant. cPanel/WHM stacks run Apache with Prefork + mod_php (using the PHP module, not PHP-FPM) because process isolation means one customer's runaway PHP script crashes their process, not everyone's. .htaccess lets customers configure rewrites and password protection without SSH access. Technically suboptimal — but operationally safe for untrusted multi-tenant environments.
2. Enterprise Intranet / Legacy Application Hosting
Enterprises running Java apps via mod_jk (Tomcat connector) or Perl/Python via mod_wsgi don't rewrite to NGINX because the integration is already solved. The cost of migration doesn't justify the marginal performance gain when your app is the bottleneck, not the web server.
3. Reverse Proxy in Front of Microservices
mod_proxy_balancer with mod_proxy_http turns Apache into a capable reverse proxy with built-in health checks, sticky sessions, and weighted balancing. Teams running mixed stacks (some PHP, some Node) use Apache as the unified frontend that routes by hostname or path, avoiding an additional NGINX layer.
4. SSL Termination Layer
mod_ssl with OpenSSL supports SNI (Server Name Indication), session resumption, OCSP stapling, and modern cipher suites. In environments where operations teams are already Apache-fluent, it's simpler to terminate TLS here than introduce HAProxy or NGINX just for SSL.
Why Apache Over NGINX Here?
NGINX wins on raw concurrency for static file serving and simple proxying. Apache wins when you need per-directory config flexibility, deep module integration (especially for legacy apps), or .htaccess-based delegation. Choosing Apache isn't a performance mistake — it's usually a correctness decision for the operational context.
