Deep dive into How Web Browsers work (with illustrations) ⚙️🚀

7 min readMar 19, 2023

Browsers are now a part of everyday life, but have you ever wondered how they work under the hood?

This article will take a closer look at the magic behind the scenes of web browsers.

Let’s get started! 🚀

1. Navigation

Navigation is the first step of loading a web page. It happens when the user enters a URL in the address bar or clicks on a link.

1.1. DNS lookup

The first step is to find the IP address where the resources are located. This is done by a DNS lookup.

The Domain Name System (DNS) Server is a server that is specifically used for matching website hostnames (like www.example.com) to their corresponding Internet Protocol or IP addresses. The DNS server contains a database of public IP addresses and their corresponding domain names

For example, if you visit www.example.com, the DNS server will return the IP address 93.184.216.34 which is its corresponding IP address.

1.2. 3-way TCP Handshake

The next step is to establish a TCP connection with the server. This is done by a 3-way TCP handshake.

First, the client sends a request to open up a connection to the server with a SYN packet.

The server then responds with a SYN-ACK packet to acknowledge the request & requesting the client to open up a connection.

Finally, the client sends an ACK packet to the server acknowledging the request.

1.3. TLS handshake

If the website uses HTTPS (encrypted HTTP protocol), the next step is to establish a TLS connection via a TLS handshake.

During this step, some more messages are exchanged between the browser and the server.

Client says hello: The browser sends the server a message that includes which TLS version and cipher suite it supports and a string of random bytes known as the client random.
Server hello message and certificate: The server sends a message back containing the server’s SSL certificate, the server’s chosen cipher suite, and the server random (a random string of bytes that’s generated by the server).
Authentication: The browser verifies the server’s SSL certificate with the certificate authority that issued it. This way the browser can be sure that the server is who it says it is.
The premaster secret: The browser sends one more random string of bytes called the premaster secret, which is encrypted with a public key that the browser takes from the SSL certificate from the server. The premaster secret can only be decrypted with the private key by the server.
Private key used: The server decrypts the premaster secret.
Session keys created: The browser and server generate session keys from the client random, the server random, and the premaster secret.
Client finished: The browser sends a message to the server saying it has finished.
Server finished: The server sends a message to the browser saying it has also finished.
Secure symmetric encryption achieved: The handshake is completed and communication can continue using the session keys.

Now requesting and receiving data from the server can begin.

2. Fetching resources

After the TCP connection is established, the browser can start fetching resources from the server.

2.1. HTTP Request

If you have any experience with web development, you will have encountered the concept of HTTP requests.

HTTP requests are used to fetch resources from the server. It requires a URL & the type of request (GET, POST, PUT, DELETE) to be processed. The browser also adds some additional headers to the request to provide additional context.

The first request sent to a server is usually a GET request to fetch an HTML file.

2.2. HTTP Response

The server then responds with an appropriate HTTP response for the given request.

The response contains the status code, the headers & the body.

3. Parsing HTML

Now comes the main section. After the browser has received the HTML file, it parses it to generate the DOM (Document Object Model) tree.

This is done by the browser engine which is the core of the browser (Eg: Gecko for Firefox, Webkit for Safari, Blink for Chrome, etc).

Here is an example HTML file:

<!DOCTYPE html>
<html>
  <head>
    <title>Page Title</title>
  </head>
  <body>
    <p>Hello World!</p>
  </body>
</html>

3.1. Tokenization

The first step for displaying the web page is to tokenize the HTML file. Tokenization is the process of breaking up a string of characters into meaningful chunks for the browser, called tokens.

Tokens are the basic building blocks of the DOM tree.

3.2. DOM Tree construction

Lexing is the process of converting a sequence of tokens into a tree structure called the DOM tree.

The DOM tree is a tree data structure that represents the nodes in the HTML document.

NOTE: If the page requires any external resources it will be handled as follows:

Non-blocking resources are fetched in parallel. Eg: Images.
Deferring resources are fetched in parallel but are executed after the DOM tree is constructed. Eg: script WITH defer attribute & CSS files.
Blocking resources are fetched and executed sequentially. Eg: script WITHOUT defer attribute.

4. Parsing CSS

After the DOM tree is constructed, the browser parses the CSS file to generate the CSSOM (CSS Object Model).

This process is similar to the DOM tree construction using tokenization & generation of the CSSOM

5. Executing JavaScript

As mentioned previously, if the page requires a blocking script, it will be fetched and executed instantly, while the DOM tree construction is paused, else the script will be fetched & executed after the DOM tree construction is completed.

Regardless of when the script is executed, it will be handled by the JavaScript engine which too like the browser engine varies from browser to browser.

5.1. JIT compilation

Assuming you are familiar with the concept of interpreters & compilers, the JavaScript engine uses a hybrid approach called JIT (Just in Time) compilation.

JIT stands for Just In Time, meaning, unlike with a compiled language, such as C, where the compilation is done ahead of time (in other words, before the actual execution of the code), with JavaScript, the compilation is done during execution

6. Rendering

It’s finally time to render the page. The browser uses the DOM tree & CSSOM to render the page.

6.1. Render tree construction

The first step is to construct the render tree. The render tree is a subset of the DOM tree that contains only the elements that are visible on the page.

6.2. Layout

The next step is to layout the render tree. This is done by calculating the exact size & position of each element in the render tree.

This step happens every time we change something in the DOM that affects the layout of the page, even partially.

Examples of situations when the positions of the elements are recalculated are:

Adding or deleting elements from the DOM
Resizing the browser window
Changing the width, height, or the position of an element

6.3. Painting

Finally, the browser decides which nodes need to be visible and calculates their position in the viewport, it’s time to paint them (render the pixels) on the screen. This phase is also known as the rasterization phase, where the browser converts each element calculated in the layout phase to actual pixels on the screen.

Just like the layout phase, this phase happens every time we change the appearance of an element in the DOM, even partially.

Examples of situations when the positions of the elements are recalculated are:

Changing the outline of an element
Changing the opacity or visibility of an element
Changing the background color of an element

6.4. Layering & Compositing

The final step is to composite the layers. This is done by the browser to optimize the rendering process.

Compositing is a technique to separate parts of a page into layers, painting them separately and compositing as a page in a separate thread called the compositor thread. When sections of the document are drawn in different layers, overlapping each other, compositing is necessary to ensure they are drawn to the screen in the right order and the content is rendered correctly

NOTE: DOM updates, specifically layout & paint, are extremely expensive operations, which can be noticed significantly on low-end devices. So, it’s important to minimize the number of times it is triggered.

That’s all folks! 🎉

Follow me for bi-weekly new tidbits on the domain of tech!

Need a Top Rated Front-End Development Freelancer to chop away your development woes? Contact me on Upwork

Want to see what I am working on? Check out my Personal Website and GitHub

Want to connect? Reach out to me on LinkedIn