Things I Learned From Link Chomp
Jon’s note: Keen readers may notice that I am not publishing the audio portion of my blog posts anymore. I made that decision based strictly on usage: hardly anyone downloaded the podcast files, so I take that to mean there’s very little interest in them.
I recently had a need for a link shortener and that simple need quickly turned into an idea to build one from scratch. I figured it can’t be that hard, and I was right. Shortening the links is easy, but dealing with the unwashed masses of the internet that are going to use it is another problem entirely. I have lots of little internet-based projects that are locked down for only my use because the internet is rife with people who just want to watch the world burn, and they’ll try to destroy everything that comes across their path. I met some of them during my first week with Link Chomp.
The mechanics
I went into this project with my eyes wide open to infosec concerns. I maintain infrastructure for an infosec company and while I am not a researcher, I have a healthy exposure to the types of bad things people do on the internet. Based on that experience, I designed the architecture with these concepts in mind:
No databases. Everything I install presents yet another attack vector. There are some things that I simply need, such as a web server, but database servers are a really big target so I said “no” to that.
Segregated into micro-services. Historically, most web apps are monolithic hunks of code and a change to one area of it ripples through the rest of the app. Modern thinking eschews that idea and instead embraces the concept of micro-services. Under a micro-services architecture, an app is not one thing. It is comprised of smaller services that run independently of each other. This allows greater availability because if one service has a problem, the others keep humming along. It also allows for better security because if one service is compromised, that does not open the door wide for the entire application to be compromised.
Try to make the links safe. I’m tilting at windmills with this concept, but I need to try. Link shorteners ostensibly just…well…shorten links. So a long link like https://www.pluralsight.com/courses/aws-certified-cloud-practitioner becomes a short little guy like this: https://cmp.cx/8a971. Keen readers will notice that the destination link is now totally obscured. Bad guys know that and they use link shorteners to trick people into clicking bad links by hiding the true destination. I try to address that.
More detail
Let’s examine each of those points in more detail.
No database
I have two concerns with running a database. I already mentioned the first which is how attractive databases are to attackers. They’re attractive because they’re complicated and it can be hard to write code that prevents really clever attackers from extracting data from the databases. Virtually every data breach we hear about these days takes the form of a bad guy successfully exfiltrating data from a database.
The other main reason that databases are attractive is that bad guys can sometimes put data into a database, not just take it out. A bad guy can write a piece of code to…say…email the contents of the login form to them and then store that piece of code in the database so it executes every time someone logs in.
Link Chomp avoids this by devolving to a much earlier technology: text files. While it is certainly possible to exfiltrate or inject data into a text file, it is harder. It is harder because text files are easier to secure using basic Linux file system attributes and simple is always better.
There are downsides to this decision. The main one being concurrency. Database servers know what to do if two Link Chomp users create a new “chomp” (my cute name for a shortened link) at the same time. The database server can make sure that two people don’t accidentally get the same chomp code, or don’t accidentally overwrite each other’s new chomp whereas I have to do all that work myself because I use text files.
Micro-services
I will be the first to admit that I could have pursued this a little farther than I did. I broke Link Chomp into three services: the interface where people go to create new chomps, the service that redirects users when they click a chomped link, and the background services such as performing backups and expiring old links.
The process of creating a chomp is quite involved – there are many steps such as ensuring the destination domain looks properly formatted, checking that the link is not outlandishly obviously malicious, ensuring there are no duplicate chomp codes, and then the whole process of recording all this stuff. I could have broken all those into separate services, but because the application requires almost all of that to happen, a single broken service would break the app anyhow, so the risk/reward ratio was not in favour of doing the extra work.
However, the chomp code is very portable. It has no dependencies outside of a few standard PHP modules and the code flow is broken nicely into compartmentalized functions. That makes it very easy to troubleshoot problems and also makes it easy to add new functionality. For example, I am always messing around in the function that checks if a link seems bad. It is a work in progress that likely will never be very accurate, but I am always adding tests, then evaluating if those tests impact the speed of the site, and then adjusting as needed.
Safety
The last paragraph above sums up the safety issue. Because we know bad guys use link shorteners to obfuscate destination URLs, I am compelled to make an effort to prevent that. But, let’s get real, that is pretty much impossible. Bad guys don’t link to obvious things like www . thisisabadlink . com. They know that’s going to get banned everywhere. Instead, they spend a ton of time breaking into legitimate websites and putting phishing pages on those sites. Because of that, bad pages that contain phishing code, or credit card stealing code, etc. usually reside on legitimate websites that are not on any blacklists and are very hard to detect as being bad simply from the link. In theory, it would be possible to scan the page looking for bad code, but that is simply too slow and would make Link Chomp unusable.
So, what do I do instead? I do some checking against a bad word list and I check to make sure the domain is properly formatted. I also cap the URL limit which is kind of a weird decision. Link Chomp is a link shortener so it is assumed that users are going to come to it with really long links. But before I capped it, I would routinely see internet goofs pasting in URLs thousands of characters long. So, capping is required and there will probably be a few legitimate users caught up in that, but I don’t have a better solution right now.
Next, Link Chomp resides behind the Sucuri firewall (this is not a secret, anyone can see that if they know how). Sucuri has a nice API and I use it to block repeated bad requests. If someone is obviously hammering Link Chomp with bad URLs, I issue an API call to Sucuri to block their IP. IP-based blocking isn’t perfect, but these guys are not determined, attackers. They’re just being knobs and they go away once they encounter even a small bit of resistance like this.
Future plans
Now that the framework is built, it is easier to add functionality. Some of the ideas on my road map are:
URL blacklist checking
I’d like to check submitted URLs against a blacklist. The obvious choice here is Safe Browsing, but there are a few others that I am considering. This is a very important decision because the check has to be extremely fast. I do not want to add a second or two to the chomp creation while we check a blacklist.
Initially, I thought that using a service that would let me download a list of blocked URLs would be best because I can check that very quickly on the server. However, I quickly realized that idea was foolish for a few reasons.
The first reason that idea will not work is the sheer size of a URL blacklist. There are just millions upon millions of bad sites out there and I don’t think it is feasible to handle a file that large at run time.
The second reason is that the list would always be somewhat out of date. I would download it periodically, but probably only daily, or maybe a few times a day. The problem with outdated blacklists is that the bad guys are putting the most effort into spreading their bad link via phishing emails in the first few hours after they create it. Some percentage of the users receiving those phishing emails will report it, and the URL will be blacklisted reasonably soon. If I am using even a slightly outdated blacklist, Link Chomp will be blind to the blacklisted domain for longer than it would be if I were checking in real-time.
Why do I care? Well, when you chomp a link you get a shortened URL back from the cmp.cx domain. That means there are tons of links our there using cmp.cx and I do not want that to get blacklisted. If cmp.cx were to be blacklisted, it would cause a lot of problems for the people using those chomps.
So, there is work to be done here. I am not sure what the final solution looks like yet.
Custom links and subdomains
I started this project because I wanted subdomain support which, ironically, I have not built in yet, but I think it is a great idea. Subdomains are domains tacked on the front of a domain, for example jonwatson.substack.com - “jonwatson” is a subdomain of “substack.com”.
Subdomain support would allow shortened URLs in the form of 8a971.cmp.cx. What you get back from Link Chomp now is something like cmp.cx/8a971. The reason I want that is because I want to deploy a wildcard cert for *.cmp.cx and then I can offer TLS secured subdomain forwarding. Maybe nobody cares about that other than me, but I like the idea.
Custom links are a similar, but different thing, that allows users to specify the chomp code part of the URL instead of accepting the randomly generated one. For example, this random chomp https://cmp.cx/4907b can be set to https://cmp.cx/notarickroll. Ok, it totally is a Rick Roll, but you get my point.
The blocker with these two ideas is that I want custom stuff like this to be a premium service, and not open to the unwashed masses. To support that, I need to support accounts – the ability for users to make accounts, and that has a whole chunk of work behind it so that is slated for some future weekend.
Final thoughts
Every project I’ve worked on is always more complex than it seems. I did not know exactly what the complexities would turn out to be with Link Chomp, but it did not disappoint. However, the basic idea is simple so the complexities did not become insurmountable over overwhelm me. And now I am in a position to build smaller, neater, features into it as I go along.
You can find Link Chomp here.
my shorter content on the fediverse: https://the.mayhem.academy/@jdw