Deserializing JSON
Intro
A common workflow when writing software is to take data received from an external source such as a network request to an API or a file and deserialize it into an in memory object. The reverse process is also true of serializing an in memory object to transfer over the network or to persist into a file. The most widespread data format for this is JSON due to its simplicity and ability to inspect its human readable contents.
We'll show how to interact with JSON data with TypeScript, Python, Java and Go and make some observations along the way. More specifically we'll focus on deserialization using the standard library where possible to keep code snippets pure and simple although in production you may prefer other libraries for example to construct HTTP requests.
Throughout this post we'll use the fake API from
JSONPlaceholder as an example. The
schema of the JSON todo data loaded from
https://jsonplaceholder.typicode.com/todos/1
looks as follows:
{
"userId": 1,
"id": 1,
"title": "delectus aut autem",
"completed": false
}
Untyped deserialization
To start with let's take a look at loading JSON data in an untyped way. This might be useful in quick one off scripts or dynamic situations where the exact schema of the data may not be known up front.
TypeScript
Given that all JSON is valid as a JavaScript expression (hence the name
JavaScript Object Notation) it's unsurprising that working with JSON is
straightforward. When using the Fetch API,
Response.json()
resolves JSON in a response body to a JavaScript object. Alternatively,
JSON.parse()
may be used when working with string content from a loaded file.
Note that in JavaScript the number
type is a double precision floating point
number whereas the JSON spec does not require a specific precision. In practice,
for interoperability it may be expected that double precision floating point
numbers are used in JSON or strings are used instead to avoid ambiguity over
loss of precision of large numbers.
const response = await fetch("https://jsonplaceholder.typicode.com/todos/1");
const data = await response.json(); // any
const id = data.id; // any
const userId = data.userId; // any
const title = data.title; // any
const completed = data.completed; // any
const doesNotExist = data.doesNotExist; // any, undefined at runtime!
Python
The Python dictionary is the analog to the JavaScript object for working with
key value data. Handling untyped data is natural as a dynamically typed language
although the addition of static type checking is possible on top through
mypy and type hints similar to the
addition of TypeScript on top of JavaScript. We can use
json.loads()
to
parse a string containing JSON data into a dictionary.
Note that JSON numbers will automatically be deserialized to either int
or
float
Python types depending on whether it can be represented as an integer.
import json
import urllib.request
response = urllib.request.urlopen("https://jsonplaceholder.typicode.com/todos/1")
data = json.loads(response.read().decode("utf-8")) # Any
id = data["id"] # Any
user_id = data["userId"] # Any
title = data["title"] # Any
completed = data["completed"] # Any
does_not_exist = data["doesNotExist"] # Any, KeyError at runtime!
Java
This example uses the standard HTTP client API introduced in Java 11 together
with Jackson one of the de facto
libraries for handling JSON in Java. We receive a JsonNode
from the
ObjectMapper.readTree()
method allowing us to access arbitrary JSON properties
in the tree structure.
Note that when calling methods such as JsonNode.asInt()
to coerce the
underlying value to an int
other JSON values other than numbers such as
booleans and strings may be accepted. If the value cannot be coerced to an int
the default value of the type will be returned which would be 0 for ints.
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.IOException;
import java.io.InputStream;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.net.http.HttpResponse.BodyHandlers;
public class JavaExample {
public static void main(String[] _args) throws URISyntaxException, IOException, InterruptedException {
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
.uri(new URI("https://jsonplaceholder.typicode.com/todos/1"))
.GET()
.build();
HttpResponse<InputStream> response = client.send(request, BodyHandlers.ofInputStream());
ObjectMapper mapper = new ObjectMapper();
JsonNode data = mapper.readTree(response.body());
int id = data.get("id").asInt();
int userId = data.get("userId").asInt();
String title = data.get("title").asText();
boolean completed = data.get("completed").asBoolean();
JsonNode doesNotExist = data.get("doesNotExist"); // null at runtime!
}
}
Go
The empty interface interface{}
in Go allows working with data of an unknown
type. If we use json.Unmarshal()
on a pointer to a map[string]interface{}
value we'll receive the deserialized
JSON object according to the specified types:
bool, for JSON booleans
float64, for JSON numbers
string, for JSON strings
[]interface{}, for JSON arrays
map[string]interface{}, for JSON objects
nil for JSON null
Note that numbers are always deserialized to float64
when an interface value
is used so an explicit type conversion to int
is needed if desired.
import (
"encoding/json"
"io"
"net/http"
)
response, err := http.Get("https://jsonplaceholder.typicode.com/todos/1")
if err != nil {
panic(err)
}
defer response.Body.Close()
body, err := io.ReadAll(response.Body)
if err != nil {
panic(err)
}
var data map[string]interface{}
err = json.Unmarshal(body, &data)
if err != nil {
panic(err)
}
id := data["id"].(float64) // float64
userId := data["userId"].(float64) // float64
title := data["title"].(string) // string
completed := data["completed"].(bool) // bool
doesNotExist := data["doesNotExist"] // interface{}, nil at runtime!
Typed deserialization
Now let's consider how to deserialize JSON into typed objects allowing our code to operate in a safer way and automatically catch mistakes such as a typo accessing wrong field names.
TypeScript
An explicit type annotation gives a hint to the TypeScript compiler on the type we want it to treat the object as. Now the compiler knows what fields exist and what types they should be it can catch errors on incorrect access (according to the types it was told).
Note that this is compile time only and does not change the runtime JavaScript code once compiled with any extra validation or type conversions - the exact same JavaScript object is still used which may error at runtime if typed incorrectly or subsequently mutated.
interface Todo {
userId: number;
id: number;
title: string;
completed: boolean;
}
const response = await fetch("https://jsonplaceholder.typicode.com/todos/1");
const data: Todo = await response.json();
const id = data.id; // number
const userId = data.userId; // number
const title = data.title; // string
const completed = data.completed; // boolean
const doesNotExist = data.doesNotExist; // compile error!
Python
The TypedDict
type
allows us to perform a very similar type annotation. When using
mypy to check static types we achieve
a similar level of protection at lint time.
Note that this is lint time only and again does not affect the executed code at runtime.
import json
import urllib.request
from typing import TypedDict
class Todo(TypedDict):
id: int
userId: int
title: str
completed: bool
response = urllib.request.urlopen("https://jsonplaceholder.typicode.com/todos/1")
data: Todo = json.loads(response.read().decode("utf-8")) # Todo
id = data["id"] # int
user_id = data["userId"] # int
title = data["title"] # str
completed = data["completed"] # bool
does_not_exist = data["doesNotExist"] # typeddict-item error at lint time!
Java
The first step is to define a class to deserialize data into. Here we define a
simple Todo
plain old Java object (POJO) with standard getters and setters.
Jackson automatically works with these types of classes out of the box by
matching the field names based on the method names. Jackson annotations can
customise the deserialization behaviour such as @JsonProperty
to remap field
names.
Note that this does change the runtime code. It creates new Todo
objects
rather than JsonNode
objects and by default, Jackson object mappers fail on
encountering unknown properties during deserialization.
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.IOException;
import java.io.InputStream;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.net.http.HttpResponse.BodyHandlers;
public class JavaExample {
public static void main(String[] _args) throws URISyntaxException, IOException, InterruptedException {
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
.uri(new URI("https://jsonplaceholder.typicode.com/todos/1"))
.GET()
.build();
HttpResponse<InputStream> response = client.send(request, BodyHandlers.ofInputStream());
ObjectMapper mapper = new ObjectMapper();
Todo todo = mapper.readValue(response.body(), Todo.class);
int id = todo.getId();
int userId = todo.getUserId();
String title = todo.getTitle();
boolean completed = todo.getCompleted();
JsonNode doesNotExist = todo.getDoesNotExist(); // compile error!
}
private static final class Todo {
private int id;
private int userId;
private String title;
private boolean completed;
public int getId() {
return id;
}
public void setId(int id) {
this.id = id;
}
public int getUserId() {
return userId;
}
public void setUserId(int userId) {
this.userId = userId;
}
public String getTitle() {
return title;
}
public void setTitle(String title) {
this.title = title;
}
public boolean getCompleted() {
return completed;
}
public void setCompleted(boolean completed) {
this.completed = completed;
}
}
}
Go
Instead of deserializing into the empty interface we can define and deserialize
into a struct
. The field names are matched to struct
field names accepting
case-insensitive matches. Alternative field names may be used by adding extra
metadata in struct tags such as
Field int `json:"myName"`
Note that similar to Java this does have an effect at runtime for example
choosing to deserialize numbers as ints rather than float64
.
import (
"encoding/json"
"io"
"net/http"
)
type Todo struct {
Id int `json:"id"`
UserId int `json:"userId"`
Title string `json:"title"`
Completed bool `json:"completed"`
}
response, err := http.Get("https://jsonplaceholder.typicode.com/todos/1")
if err != nil {
panic(err)
}
defer response.Body.Close()
body, err := io.ReadAll(response.Body)
if err != nil {
panic(err)
}
var data Todo
err = json.Unmarshal(body, &data)
if err != nil {
panic(err)
}
id := data.Id // int
userId := data.UserId // int
title := data.Title // string
completed := data.Completed // bool
doesNotExist := data.DoesNotExist // compile error!
Data constraints
Going further than simply parsing we may want to enforce data constraints relevant to the objects in our data model.
Some of these may be inherent to the deserialization itself such as:
- Treat missing or null values for array fields as an empty array
- Remove null elements inside arrays
- Fail on extra fields sent that we do not recognise
This category of data constraint can often be tuned for example with Jackson's DeserializationFeature.
Others may be relevant to business logic or necessary for the object as a whole to make sense for example invariants between fields:
- Max length of string for a title
- Non-negative integer value for a page size
- Value matches expected format like a book ISBN number
For TypeScript and Python, these will obviously require extra code changes at runtime and will make the deserialization perform closer to the given Java and Go examples. Ideas to explore in this direction are:
- Typescript
- Python
- Java
- JSR 380 and the
jakarta.validation.constraints
annotations - Immutables with precondition check methods
- JSR 380 and the
- Go
JSON Schema also exists as a standard for defining a schema for JSON data with validator implementations for all of the popular programming languages.
Conclusion
One of the main purposes of JSON is for interoperability between different software boundaries. It's interesting to take a fundamental task such as deserializing JSON and compare side by side how different languages and their libraries approach it.
Notably, Java stands out as requiring an external library to perform basic parsing and potentially even further libraries for cutting down boilerplate code through codegen of standard getter/setters. I'm personally a fan of the mix of conciseness and out of the box safety provided by Go in this case without the need for a lot of work to setup additional tools.