Issue 011

Elixir hot code swapping to the rescue

How I stopped a denial of service attack on my platform whilst on holiday and without access to my code

May 10, 2023 · 7 minute read

On Saturday morning, I woke up late, still satisfied from last night’s tasty Amatriciana, Tiramisu, and wine. I took a moment to enjoy the warm breeze that Rome’s 10am winds brought with them. The plan for the day was an exciting visit to the Colosseum, followed by an inspiring tour of the Vatican. However, everything changed when I checked my emails and found a message from Fly.io.

My server for madepublic.io had crashed in the middle of the night.

Odd I thought - there’s a dedicated VPS with 2GB of RAM running that Elixir application - what could have happened?

I tried to access the platform using my phone - I didn’t have my laptop with me, I decided I wasn’t going to be doing any work or indiehacking this weekend - and no response.

I logged into fly.io to check the logs and saw a stream of 200 OK responses coming from my user registration controller action. Looks like someone was creating 240 user accounts with real email addresses with bitcoin related spam in their user details per second.

Credential stuffing attack while I’m on holiday - great.

sad

I scaled the server up to 2GB to assess the damage.

Madepubic has a real-time metadata visualisation on the homepage built with Phoenix PubSub that shows the number of registered users, projects, project views, etc.

So, what was the damage? 150,000 new accounts (and counting - by scaling the server up, the script attacking me was back in action). It’s a good thing I didn’t have autoscaling on Fly.io turned on, or I would have been helping the attacker create false accounts.

I took a big sigh, told my girlfriend I’d be a little late for breakfast, and started thinking about what I could do without my personal laptop, access to my code, access to Fly.io, or access to anything I would normally use to stop this (and in time to meet all the scheduled plans for the day).

Attempt 1

Scale the server back down to less RAM to stop the script running.

Fly.io has a dashboard I could access from my phone, I scaled the server up, so I could scale it down - that would stop the accounts being created but I’d be turning off my entire platform for any real users - a bad look. No bueno.

Attempt 2

Use my girlfriends laptop to push a fix

My girlfriend had her work laptop thanks to working abroad before I joined her! Excellent, I’ll download Git, my code and the fly command line tools.

Argh, Windows - never mind, I’ll press on, I thought.

I pulled down my code (after panic typing the Airbnbs 24 character Wi-Fi password incorrectly 4 times) and when I tried to push to GitHub I got an administrator error and required an Admin JumpCloud password. Great.

What else did I have near me? My flat mates laptop (also on holiday with his girlfriend with us)!

Attempt 3

Use my flatmates laptop to deploy a code change

I called him at breakfast, I found his laptop, got his password, repeated half the steps from Attempt 2 and was met with the need to insert an admin PIN (Why Windows?!). Okay cool, no problem, just a phone call away.

My flat mate uses his laptop entirely for Chrome and Microsoft Word - he’s never tried to install anything on it, so when I told him that I needed the PIN that his ex-girlfriend had evidently set up (as the laptop was shared between them during their relationship), I knew my morning wasn’t going to get better anytime soon.

awkward

After a bit of texting from his end, I finally had the PIN (thanks Paul). But there was a problem - I had to be across Rome in 30 minutes for a pre-paid tour of the coliseum and time was running out - I didn’t have the time to install Elixir, asdf, flyctl, git, an IDE (and figure out how to do all of that on an OS I’d not used in 10 years).

So what did I do?

The hackiest thing I’ve ever done on a production server:

I re-defined my user registration controller action while the server was still running.

Attempt 4

Hot-code deploy an invalid controller action

Elixir applications run as BEAM processes - you can access these in iEX sessions connected to your application. You know what you can also do as a result? Redefine your modules in real-time.

I downloaded flyctland authenticated myself using my phone, accessed my application and used the fly ssh consolecommand to get straight into my server.

Once in I was inside my VPS so I had to find my application - after some digging I found what I was looking for in /app/bin/madepublic .

I got into an iEX session and started to think about what could I do in 5 minutes to stop the script - I was racing against the clock and I was determined not to give the bastard responsible any more of my holiday, but at the same time, I also didn’t want to let any of my users down.

So I thought if I could disable the ability to create accounts, existing users could still use the platform and no more accounts could be made.

In iEX, I redefined my module and skipped the changeset that was required to render the page in both the GET handler for the page, and the POST handler for submitting the form - just in case the script attacking me was a headless browser or not. Could I have just thrown an error instead? Probably, but I was stressed, a little hungover, full of last nights pasta and with a group of people I was probably starting to piss off.

defmodule MadepublicWeb.Users.RegistrationController do
	use MadepublicWeb, :controller

	def new(conn, _params) do
		render(conn, "new.html")
	end

	def create(conn, %{"user" => user_params}) do
		render(conn, "new.html")
	end
end

I refreshed my page and boom, 500 error.

It had worked.

The server load went down so users could carry on using the platform as intended, and even though the malicious script kept going for another 5 hours, all it did now was give a 500 error that a changeset wasn’t available in a form.

The colosseum tour was saved (as was my server bandwidth)!

So what did I learn?

Secure your site from bot attacks before you enable auto-scaling. I was lucky that I didn’t have auto-scaling enabled or the damage would have actually been worse. I’m grateful that this ordeal only meant that my free plan for Mailjets email sending maxed out for the month and I didn’t end up having to pay for 150,000 confirmation emails.

Yay for not being so successful that I have a dedicated plan, I guess?

I initially thought about throwing Cloudflare over my side projects and implement google re-captcha going forward - a good suggestion to any indie hackers going forward that are rolling their own auth - but then I realised that I’m an indiehacker, I want to do the least amount possible!

So I’m using an Elixir based method to protect my registration form which was highlighted to me by Michael Lubas, the founder of Paraxial (a bot protection and security platform for Elixir and Phoenix) on the Elixir slack, shout out to this awesome community.

Michael suggested using PlugAttack which I’ll be writing about in an upcoming blog post if people are interested (let me know on Substack or Twitter).

Bring your laptop with you on holiday if you have a platform in production that you care about being in production. If I had my laptop with me, I could have blown over the most stressful ordeal of this entire thing and had access to what I needed immediately.
Fly.io are awesome for letting me get into my server with a single terminal command - even better if I could do this from a web browser. If you’re reading this Chris McCord, there’s a feature request for you.
Elixir is awesome for giving me the (admittedly terrifying) capabilities to hot-update my code in production - BEAM ❤️.

I hope you’ve found this cautionary tale of Elixir production debugging while in Rome useful.

Be sure to subscribe to my Substack below for more content like this, and follow me on Twitter for more indiehacking tips and insights.

Enjoyed this content?

Want to learn and master LiveView?

Check out the book I'm writing

The Phoenix LiveView Cookbook

fin

Share to Twitter Share to Hacker News Share to LinkedIn