Every engineer has seen this question in interviews. Most give a vague hand-wavy answer about "DNS and TCP." Here's what's actually going on — every layer, every handshake, no padding.
The browser's first move: parse and cache check
Before a single packet leaves your machine, the browser parses the URL into its parts: protocol (https), host (example.com), path (/products), and any query params. Then it goes hunting for a cached answer — in order:
- Browser memory cache — did you visit this 2 seconds ago?
- OS DNS cache —
nscdor the Windows DNS client resolver cache - Hosts file —
/etc/hostson Linux/Mac,C:\Windows\System32\drivers\etc\hostson Windows
Only on a cache miss does it go to the network. For a first-ever visit to a site, you're going all the way down.
DNS: mapping a name to an IP
DNS is a distributed, hierarchical key-value store. Your machine doesn't know where example.com lives — it asks a recursive resolver (usually your ISP's, or 8.8.8.8 if you've configured Google's). The resolver then does the heavy lifting:
- Asks a root nameserver (
a.root-servers.net, etc.) — "who owns.com?" - Root returns the address of the TLD nameserver for
.com - Resolver asks the TLD NS — "who owns
example.com?" - TLD NS returns the authoritative nameserver for
example.com - Authoritative NS returns the actual A record — an IPv4 address like
93.184.216.34
The entire chain is cached at each layer with a TTL. A low TTL (60s) means DNS changes propagate fast but every request pays the lookup cost. A high TTL (86400s = 24h) is cheap on queries but slow to update.
On a warm cache, this whole dance is skipped. On a cold one, it adds 20–120ms.
TCP: the connection contract
You have an IP. Now you need a reliable byte stream. TCP handles that with a three-way handshake:
Client → Server: SYN (seq=100)
Server → Client: SYN-ACK (seq=200, ack=101)
Client → Server: ACK (ack=201)This round trip establishes sequence numbers on both sides so every segment can be tracked, reordered, and retransmitted if lost. The handshake alone costs one RTT — on a cross-continental connection that's 150–200ms before you've sent a single byte of HTTP.
HTTP/2 and HTTP/3 reduce this cost. HTTP/3 over QUIC skips the TCP handshake entirely and combines connection establishment with TLS in a single round trip.
TLS: encrypting the channel
For HTTPS (which is everything now), a TLS handshake happens on top of TCP. TLS 1.3, which is standard today, takes one round trip:
- ClientHello — client sends supported cipher suites and a key share
- ServerHello + Certificate + Finished — server picks cipher, proves identity with its cert, sends encrypted Finished
- Client Finished — confirms and the encrypted channel is open
Older TLS 1.2 needed two round trips. TLS 1.3 also supports 0-RTT resumption — if you've connected before, you can send application data in the very first packet. Slight security trade-off (replay attacks), but most CDNs use it for performance.
Certificate validation involves checking the cert chain up to a trusted CA root, and optionally querying OCSP for revocation status. Certificate transparency logs are checked too — Chrome requires it.
HTTP request: what you actually send
With an encrypted TCP connection open, the browser sends an HTTP request:
GET /products HTTP/1.1
Host: example.com
Connection: keep-alive
Accept: text/html,application/xhtml+xml
Accept-Encoding: gzip, br
Cookie: session=abc123; pref=dark
Cache-Control: max-age=0A few things worth noting here. Host is mandatory in HTTP/1.1 — it's how virtual hosting works (one server, many domains). Accept-Encoding: br means the client supports Brotli compression, which is 15–25% better than gzip on text. And cookies travel with every request to the same origin — a major reason third-party cookies are a security concern.
With HTTP/2, this request is binary-framed and multiplexed — multiple requests can be in-flight over a single connection with no head-of-line blocking at the HTTP layer.
Server-side: what happens in the black box
The request hits the server. In a typical production setup, it first lands at a load balancer (nginx, HAProxy, AWS ALB) which distributes traffic across a fleet of app servers. The load balancer also terminates TLS so backend servers don't carry that overhead.
The app server (Node.js, Rails, Django, Go — whatever) receives the HTTP request and runs your business logic. That usually means:
- Validating session/auth (JWT check, cookie lookup)
- Querying a database (Postgres, MySQL) for dynamic data
- Hitting a cache layer (Redis, Memcached) to avoid the DB entirely on hot paths
- Calling downstream microservices or external APIs
A cache hit on Redis for a hot endpoint can take the DB out of the path entirely — going from 50ms to 1ms for data retrieval. This is where the biggest performance wins usually live.
HTTP response: what comes back
The server returns:
HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Content-Encoding: br
Cache-Control: public, max-age=3600
ETag: "abc123"
Strict-Transport-Security: max-age=31536000
<html>...Key headers here: Cache-Control tells the browser how long to cache this. ETag lets the browser send a conditional If-None-Match on the next request — if unchanged, the server returns a 304 Not Modified with no body. HSTS tells the browser to never connect over plain HTTP again. The body is typically Brotli-compressed HTML.
Browser rendering: turning bytes into pixels
The HTML arrives and the browser's rendering pipeline kicks in:
- HTML Parser → DOM — bytes into a tree of nodes
- CSS Parser → CSSOM — stylesheets into a cascade-resolved rule tree
- Render tree — DOM + CSSOM merged; only visible nodes
- Layout (Reflow) — every node gets a position and size on the viewport
- Paint — rasterize each layer into pixel bitmaps
- Compositing — GPU assembles layers, handles transforms and opacity
JavaScript that touches the DOM forces a synchronous pause in parsing, which is why <script defer> and <script async> matter. CSS in <link> blocks rendering until the stylesheet is downloaded and parsed — render-blocking by design.
First Contentful Paint (FCP) is when the first pixel appears. Largest Contentful Paint (LCP) is the main performance metric for real user experience. Both are dominated by how fast the server responds and how heavy the critical rendering path is.
Numbers that matter
| Phase | Typical time (warm cache, same continent) |
|---|---|
| DNS lookup | 0ms (cached) to 100ms |
| TCP handshake | 1× RTT (10–150ms) |
| TLS handshake (1.3) | 1× RTT |
| HTTP request + response | 1× RTT + server time |
| Browser rendering | 50–300ms |
| Total TTFB | ~80–400ms |
A fast site hits FCP under 1 second end-to-end. CDN edge nodes sitting close to users, HTTP/3, aggressive caching, and a lean rendering path are the main levers.
Where things go wrong (common failure points)
- DNS TTL too low — every user pays full DNS lookup time on every visit
- No HSTS preloading — first request goes over HTTP, redirects to HTTPS, wastes a round trip
- Render-blocking JS — sync scripts halt HTML parsing; defer everything you can
- No connection reuse — HTTP/1.1 without keep-alive opens a new TCP+TLS per request
- Uncached DB queries on hot paths — Redis exists for a reason
- Large uncompressed assets — Brotli your text, serve WebP, lazy-load below-the-fold images
Key takeaways
The journey from URL to rendered page is a layered protocol stack — DNS, TCP, TLS, HTTP, server logic, rendering engine — each adding latency that compounds. The systems that feel instant have typically eliminated every redundant round trip: DNS cached at the edge, TLS 1.3 with 0-RTT, HTTP/3, CDN-cached assets, aggressive browser caching, and server-side Redis for hot data.
Every millisecond of TTFB you cut translates directly to user perception. On mobile over 4G with 60ms RTT, a two-round-trip savings is 120ms — the difference between "fast" and "noticeable."
