Hey everyone! This is going to a be a quick post covering something I’ve encountered a number of times and seen other developers struggle with: JSON encoding custom data structures in Python.

Suppose you have some custom data type you’re working with:

class MyCustomDataType(object):
    def __init__(self, foo):
        self.foo = foo

Also suppose that at some point in your program, you need to serialize this data (i.e., using json.dump or json.dumps). If you try it, you’ll probably get an error like TypeError: Object of type MyCustomDataType is not JSON serializable. This is because the json package doesn’t know how to serialize this type.

This is a pretty easy fix: we can tell the json module how to serialize our type by giving our serializing function a custom encoder. The encoder should do the following:

  • Inherit from json.JSONEncoder.
  • Override the default method in a way that let’s unknown types flow through (to preserve default behavior).

One implementation might look like this:

class CustomEncoder(json.JSONEncoder):
    """Custom encoder for handling MyCustomDataType objects."""
    def default(self, o):
        if isinstance(o, MyCustomDataType):
            data = o.to_dict()
            data["__my_custom_data_type__"] = True
            return data
        return super().default(o)

We’ll need to update our class to implement the to_dict method (or whatever you want to name your serialization method). Now when we’re serializing, we just need to set the cls field to CustomEncoder. This is also nice because we can easily and additional serialization functionality to our CustomEncoder if we want to; you can just add another type “handler” before returning super().default(o). Here’s an example of how you’d serialize this now:

class MyCustomDataType(object):
    def __init__(self, foo):
        self.foo = foo

    def to_dict(self):
        return {"foo": self.foo}

def main():
    mcd = MyCustomDataType(123)
    json.dumps(mcd, cls=CustomEncoder)

Ok, easy enough. The marginally trickier thing is deserializing our custom type. When json is parsing some data, how will it know that some arbitrary object should deserialize as one of our MyCustomDataType instances? How do we even know?? The answer is that dunder field we added before serializing: __my_custom_data_type__. This acts as a flag to our parser that basically says “I am this special type, parse me accordingly”. In order to leverage that field, we just need to hook into the parser and look for it.

The typical way to do this is with object_hook or object_pairs_hook. I’ve gotten into the habit of overriding the latter by default because 1) if you supply both, then object_pairs_hook takes precedence and 2) if you’re doing something where the order of the keys matters (e.g., creating an OrderedDict in Python 2), then you need to use the pairs hook anyway. The only difference is that the argument that gets passed to the function you assign to object_hook is a dict, whereas the argument that gets passed to the function you assign to object_hook is a list of tuples in which the first element of each tuple is a key and the second element of each tuple is a corresponding value.

Here’s how you’d write your custom parser:

def custom_object_pairs_hook(items):
    """Custom decoder hook for handling MyCustomDataType objects."""
    data = {item[0]: item[1] for item in items}
    if data.pop("__my_custom_data_type__", None):
        return MyCustomDataType.from_dict(data)
    return data

You also need to update MyCustomDataType so that it implements the from_dict method. It would arguably be simpler to just instantiate using the object signature directly, but I’ve found that this from_dict indirection is almost always needed anyway, so I just favor that approach by default. Altogether, it might look something like this:

"""Testing custom data type (de)serialization."""
import json


class MyCustomDataType(object):
    def __init__(self, foo):
        self.foo = foo

    def to_dict(self):
        return {"foo": self.foo}

    @classmethod
    def from_dict(cls, data):
        return cls(**data)


class CustomEncoder(json.JSONEncoder):
    """Custom encoder for handling MyCustomDataType objects."""
    def default(self, o):
        if isinstance(o, MyCustomDataType):
            data = o.to_dict()
            data["__my_custom_data_type__"] = True
            return data
        return super().default(o)


def custom_object_pairs_hook(items):
    """Custom decoder hook for handling MyCustomDataType objects."""
    data = {item[0]: item[1] for item in items}
    if data.pop("__my_custom_data_type__", None):
        return MyCustomDataType.from_dict(data)
    return data


def main():
    mcd = MyCustomDataType(123)
    mcd2 = json.loads(
        json.dumps(mcd, cls=CustomEncoder),
        object_pairs_hook=custom_object_pairs_hook,
    )


if __name__ == "__main__":
    main()

That’s it for this short and sweet post; see you next time when I’ll talk about a different kind of encoding!