Last week, I was working on our URL shortener app, so I figured it would be fun to write about how these tools actually work.

Most famous cheat codes

To start the discussion, let’s recap HTTP codes. Any server returns these codes on every HTTP request. The returned status code already represents the result of the operation. Unless somebody who just wants to see the world burn coded it.

Here are some of the most common codes you can encounter in the wild.

200 - OK
201 - Created
301 - Moved Permanently
302 - Found (Moved Temporarily)
400 - Bad Request
401 - Unauthorized
404 - Not Found
500 - Internal Server Error

The cool thing about their structure is that the first integer in the number tells us to which category of request they belong.

From the list above, 2xx means success, 3xx means redirect, 4xx means failed due to what the caller did, and 5xx means failed due to what the server did.

Proof over promises

It is easy to see what HTTP codes different URLs return. curl is a tool used on Mac or Linux to do that. There are also some online tools for this.

We can run a basic example:

$ curl -I https://beaver.codes
HTTP/2 200 
date: Mon, 22 Apr 2024 18:36:05 GMT
content-type: text/html; charset=UTF-8
vary: Accept-Encoding
link: <https://beaver.codes/wp-json/>; rel="https://api.w.org/"
link: <https://beaver.codes/wp-json/wp/v2/pages/2>; rel="alternate"; type="application/json"
link: <https://beaver.codes/>; rel=shortlink
server: nginx
x-cache-status: disabled

The first line of the command result expresses the HTTP status code. In this case, it shows 200, which we can see above, which means OK. It’s great to see our website working.

Let’s try a bit more complex example by querying the site’s non-secure http:// version.

$ curl -I http://beaver.codes
HTTP/1.1 301 Moved Permanently
Date: Mon, 22 Apr 2024 18:41:20 GMT
Content-Type: text/html
Content-Length: 162
Connection: keep-alive
Location: https://beaver.codes/
Server: nginx
Expires: Thu, 01 Jan 1970 00:00:01 GMT
Cache-Control: no-cache
X-Cache-Status: disabled

Because our website, like pretty much any other serious one out there, redirects unsecured traffic to the encrypted HTTPS, we get the redirect code back from the 3xx family.

The redirect status code, in addition, comes with a special header Location that tells the client where to send the traffic instead. In our case, it is the https version of the website.

These are not the droids…

To implement a shortlink functionality, we must take the short URL and somehow serve the same content as the original URL.

One implementation is to do just that. So, on each request, the server would send the request to the full URL and pipe the response back to the caller.

This is suboptimal, though. All that traffic and data would go through our server. Say somebody downloads a big video file using this method. We would have to cover the costs of that bandwidth as well as deal with the performance implications of that.

Another way these shortlink tools actually do use is to use HTTP redirects.

Let’s break it up into three steps:

1. User adds link they want the shortlink for

The shortener app then creates a record in their DB that stores the original URL and new identifier.

It is quite common to use a mix of letters and numbers, say – k-1q8J-kr0. NanoId is one library that can generate these for us.

2. The shortlink app “generates” the new URLshortlink

The app shows the user their new short URL in the next step. There are two requirements for it. a) The domain needs to point back to the app, and b) the new URL needs to reference the new identifier – k-1q8J-kr0

It can be both path argument https://link.beaver.codes/k-1q8J-kr0 or query param https://link.beaver.codes/?id=k-1q8J-kr0

Note that the shortened URL’s final length is not related to how long the original URL was. It is not some form of encoding/decoding. The final short URL does not hold the full URL by itself. We need the backend server’s DB to resolve it.

3. End user navigates to the short link

Now, when somebody does end up going to the shortlink https://link.beaver.codes/k-1q8J-kr0 the server parses back the identifier – k-1q8J-kr0.

With the identifier, it retrieves the database record, including the original full URL.

The final thing it needs to do is respond with a 3xx status code and location set to that URL.

A simple code for the whole flow can look like this:

app.all('/**', async (req, res) => {
 const identifier = parseIdentifier(req);

 const record = findRecord(identifier);

 if (record) {
   return res.redirect(record.fullURL);
 } else {
   return res.status(404).send('Record not found');
 }
});

More more more

The beauty of URL shorteners lies in the step in between, where they look up the final URL to send the caller to. They can be much more than tools to shorten URLs.

The first thing the shortener can do is add some analytics to that process. It can easily start tracking how many times it had to look up that particular record and the time distribution of those.

As the URL shortener loads the final URL from DB, that can be changed over time. For example, if a marketing agency prints a short URL as a QR code in an advertisement on train seats, it is still straightforward to change where it points to after it has started to roll over the country.

Finally, the resolution can be conditional, too. The process can require a password or simply expire after some time.

This is the end, Beautiful friend

In this post, I took the time to explore HTTP status codes. We primarily looked at the 3xx category – redirects.

Redirects are the core technological piece of making URL shorteners work. I broke down the steps of such a tool.

In the end, we discussed some of the benefits of this tool.