Issue 05

Custom JSON encoding for structs in Elixir with Jason

A guide for Elixir developers (and why you might want to)

Jul 20, 2022 · 5 minute read

Jason?

Jason is an actively maintained JSON parser and generator in Elixir - if you think of handling JSON structures to send to your clients or just shepherding data around your Elixir application you’re likely to reach for this or Poison - of which the contents of this post are relevant to too.

Purpose & Use-case

Disclaimer

The following isn’t required - feel free to skip to the content below for the tutorial proper. Reading this, however, will give you an understanding and real-life example of when you might want to use custom Jason encoding.

At Multiverse we’re working on developing a new version of an existing third-party integration and taking it in-house as part of our core platform. The feature in question isn’t important, it’s just important to know that there’s a lot of data - think millions of records.

One problem is that we don’t have direct access to the database of the existing system - what we do have is the ability to export the data into a CSV! Thankfully there are only 8 or so fields for each record meaning we can resort to a good, old-fashioned CSV import to bring the data over.

Simple right? Wrong.

Importing 1m+ rows of data in PSQL isn’t too bad but it will slow down and potentially lock up the rest of our application while we do the import - and that’s bad! We also need to transform the data into a more reasonable data structure for our new implementation - the old version had some inconsistencies and type issues we’d like to do away with - for example dates represented in epoch time and un-required data fields.

Flash lesson ⚡️ - Epoch

Epoch time (a.k.a. Unix time) is a system for describing a point in time by counting the number of seconds (excluding leap seconds) that have elapsed since the /Unix epoch/ - the Unix epoch being 00:00:00 UTC on 1-1-1970.

So now we need to transform and import each record, something that’ll take time. To mitigate this, we’re going to use RabbitMQ, an open source message broker by packaging each of the rows from our CSV into a JSON message, asynchronously firing them off as messages and let our RabbitMQ consumer deal with them as they come in concurrently - if you’re interesting in learning more about Rabbit, let me know!

So where does Jason encoding come in?

Remember that we want to transform our data - the issue with exporting everything into a CSV file is that when we read it into our application we’ll be reading everything as a string. Converting our date fields now involve us manually calling String.to_integer/2 which is something we don’t want to have to do manually - imagine we had 100 fields to import!

A good programmer is a lazy programmer.

Defining our struct

defmodule LegacyLog do
  defstruct [:id, :user_id, :legacy_log_id, :time, :notes, :date, :inserted_at, :updated_at]
end

All we have here is a simple Elixir struct with fields for each of the records we want to handle and parse from the CSV import.

Using it in our CSV import:

  defp serialise(data) do
    data
    |> Enum.map(fn row ->
      [id, user_id, logtype_id, time, _target, notes, date, timecreated, timemodified, _, _] = row

      %LegacyLog{
        id: id,
        user_id: user_id,
        legacy_log_id: logtype_id,
        time: time,
        notes: notes,
        date: date,
        inserted_at: timecreated,
        updated_at: timemodified
      }
    end)
  end

Yes, we could transform the data here as we import it but good RabbitMQ practise involves pretending that our messages are coming from somewhere external we don’t have direct access to - even though we’re producing and consuming them ourselves in this instance.

Custom encoding the JSON

The bit you’re really here for.

We need to take advantage of the way Jason has defined Protocols to define a custom implementation on how to handle our encoding.

Following on from our previously defined struct, we’re going to add a defimpl for the struct itself and specify what we want to do:

defmodule LegacyLog do
  defstruct [:id, :user_id, :legacy_log_id, :time, :notes, :date, :inserted_at, :updated_at]

  defimpl Jason.Encoder, for: LegacyLog do
    @impl Jason.Encoder
    def encode(value, opts) do
      {notes, remaining_log} = Map.pop(value, :notes)

      remaining_log
      |> Map.from_struct()
      |> Map.new(fn {k, v} -> {k, String.to_integer(v)} end)
      |> Map.put(:notes, notes)
      |> Jason.Encode.map(opts)
    end
  end
end

In the above code I’m defining an implementation for the type LegacyLog meaning when a Jason.Encode/3 call comes it’s way, it knows to call my custom code - any other encoding function call will use the default implementation (which is fine for most use cases).

The custom code itself is:

  • Popping the one field I want to remain a string out of the Struct and keeping the rest
  • Converting the remaining Struct into a Map
  • Iterating through each field and changing them into integer values
  • Putting the notes back
  • Doing the iodata generation using Jason.Encode which guarantees valid JSON.
Flash lesson ⚡️ - Protocols

Protocols are a mechanism to achieve polymorphism in Elixir when you want behaviour to vary depending on the data type. It can be pretty powerful. Again, reach out if you’d like to hear more about Protocols.

We’re done!

Subscribe to my Substack below for similar content and follow me on Twitter for more Elixir (and general programming) tips.

fin

Sign up to my substack to be emailed about new posts