Speed Test: JSONata's New Pure Python Library

Speed Test: JSONata's New Pure Python Library

What is JSONata?

Dealing with nested JSON can be challenging—looping, complex key lookup chains, etc. Though there are various options out in the community for easing this pain, the one project I always enjoyed was JSONata. It's easy to use and has strong transformation features—something lacking in libraries such as JMESPath.

However, because this project was written in JavaScript, the libraries for working with JSONata in Python were still based on JavaScript bindings. This meant slow performance due to the required Python-to-JavaScript overhead.

Intro to jsonata-python

Thankfully, a few months back, a new project, jsonata-python, was released, a pure Python implementation of JSONata (link below).

GitHub - rayokota/jsonata-python: JSONata for Python
JSONata for Python. Contribute to rayokota/jsonata-python development by creating an account on GitHub.

The next question is:

How much faster is jsonata-python?

Testing Overview

To test the new library, we will test the transformation and parsing of a 11k-line JSON document using the following:

  • Test #1 - jsonata (JSONata Python library - JavaScript bindings)
  • Test #2 - jsonata-python (JSONata Python library - Pure Python)
  • Test #3 - Pure Python

The Tests

Test Data

As previously discussed, our testing will involve a JSON file containing 11,000 lines of data. This JSON data was originally extracted from an NXOS device and then duplicated to increase the file length. Below is a snippet for reference:

{
  "jsonrpc": "2.0",
  "result": {
    "body": {
      "TABLE_interface": {
        "ROW_interface": [
          {
            "interface": "mgmt0",
            "state": "up",
            "admin_state": "up",
            "eth_hw_desc": "Ethernet",
            "eth_hw_addr": "5000.000c.0000",
            "eth_bia_addr": "5000.000c.0000",
            "eth_ip_addr": "172.29.151.2",
            "eth_ip_mask": 24,
            "eth_ip_prefix": "172.29.151.0",
            "eth_mtu": "1500",
            "eth_bw": 1000000,
            "eth_bw_str": "1000000 Kbit",
            "eth_dly": 10,
            "eth_reliability": "255",
            "eth_txload": "1",
            "eth_rxload": "1",
            "encapsulation": "ARPA",
            "medium": "broadcast",
            "eth_duplex": "full",
            "eth_speed": "1000 Mb/s",
            "eth_autoneg": "on",
            "eth_mdix": "off",
            "eth_ethertype": "0x0000",
            "vdc_lvl_in_avg_bits": 912,
            "vdc_lvl_in_avg_pkts": "1",
            "vdc_lvl_out_avg_bits": "32",
            "vdc_lvl_out_avg_pkts": "0",
            "vdc_lvl_in_pkts": 7201330,
            "vdc_lvl_in_ucast": "1717090",
            "vdc_lvl_in_mcast": "246484",
            "vdc_lvl_in_bcast": "5237756",
            "vdc_lvl_in_bytes": "592067893",
            "vdc_lvl_out_pkts": "1434511",
            "vdc_lvl_out_ucast": "1362476",
            "vdc_lvl_out_mcast": "72027",
            "vdc_lvl_out_bcast": "8",
            "vdc_lvl_out_bytes": "242896910"
          },
          {
            "interface": "Ethernet1/1",
            "state": "up",
            "admin_state": "up",
            "share_state": "Dedicated",
            "eth_hw_desc": "100/1000/10000 Ethernet",
            "eth_hw_addr": "500c.0000.1b08",
            "eth_bia_addr": "500c.0000.0101",
            "desc": "// Connected to leaf-1",
            "eth_ip_addr": "10.2.1.1",
            "eth_ip_mask": 30,
            "eth_ip_prefix": "10.2.1.0",
            "eth_mtu": "1500",
            "eth_bw": 1000000,
            "eth_bw_str": "1000000 Kbit",
            "eth_dly": 10,
            "eth_reliability": "254",
            "eth_txload": "1",
            "eth_rxload": "1",
            "encapsulation": "ARPA",
            "medium": "broadcast",
            "eth_duplex": "full",
            "eth_speed": "1000 Mb/s",
            "eth_beacon": "off",
            "eth_autoneg": "on",
            "eth_in_flowctrl": "off",
            "eth_out_flowctrl": "off",
            "eth_mdix": "off",
            "eth_swt_monitor": "off",
            "eth_ethertype": "0x8100",
            "eth_eee_state": "n/a",
            "eth_admin_fec_state": "auto",
            "eth_oper_fec_state": "off",
            "eth_link_flapped": "2week(s) 4day(s)",
            "eth_clear_counters": "never",
            "eth_reset_cntr": 1,
            "eth_load_interval1_rx": 30,
            "eth_inrate1_bits": "456",
            "eth_inrate1_pkts": "0",
            "eth_load_interval1_tx": "30",
            "eth_outrate1_bits": "232",
            "eth_outrate1_pkts": "9406384245761979392",
            "eth_inrate1_summary_bits": "456 bps",
            "eth_inrate1_summary_pkts": "0 pps",
            "eth_outrate1_summary_bits": "232 bps",
            "eth_outrate1_summary_pkts": "9406384128.00 Gpps",
            "eth_load_interval2_rx": "300",
            "eth_inrate2_bits": "624",
            "eth_inrate2_pkts": "0",
            "eth_load_interval2_tx": "300",
            "eth_outrate2_bits": "528",
            "eth_outrate2_pkts": "9532221731101444352",
            "eth_inrate2_summary_bits": "624 bps",
            "eth_inrate2_summary_pkts": "0 pps",
            "eth_outrate2_summary_bits": "528 bps",
            "eth_outrate2_summary_pkts": "9532221440.00 Gpps",
            "eth_inucast": 287801,
            "eth_inmcast": 183618442710906,
            "eth_inbcast": 17796917312826114000,
            "eth_inpkts": 17797100931269112000,
            "eth_inbytes": 137807247731,
            "eth_jumbo_inpkts": "17197692975229790875",
            "eth_storm_supp": "0",
            "eth_runts": 0,
            "eth_giants": 0,
            "eth_crc": "0",
            "eth_nobuf": 0,
            "eth_inerr": "18254342487522858739",
            "eth_frame": "0",
            "eth_overrun": "0",
...

Test Expression

The expression we will use for our tests will transform our data into a new JSON object. Our new JSON object will consist of 2 keys:

  • ip_addr_lookup - Contains the IPs for the interfaces matching a given MAC address.
  • all_macs - All MAC addresses from each of the interfaces.

Here is our full JSONata expression:

{
   "ip_addr_lookup": **[eth_hw_addr="500c.0000.1b08"]."eth_ip_addr",
   "all_macs": $.**.eth_hw_addr
}

Test Code

Here is the code used for each of the test cases:

Test #1 - jsonata

import json
import timeit


def open_file(file):
    with open(file) as f:
        data = f.read()
    return data


import jsonata

jncontext = jsonata.Context()

input_data = open_file(file="data.json")

jsonata_expr = """{
   "ip_addr_lookup": **[eth_hw_addr="500c.0000.1b08"]."eth_ip_addr",
   "all_macs": $.**.eth_hw_addr
}""".replace(
    "\n", ""
)


def do():
    result = jncontext(jsonata_expr, json.loads(input_data))
    return str(result)


time_taken = timeit.timeit(do, number=3)

print("Execution time:", time_taken)

Test #2 - jsonata-python

from timeit import timeit

import jsonata


def open_file(file):
    with open(file) as f:
        data = f.read()

input_data = open_file(file="data.json")

input_data = json.loads(input_data)


jsonata_expr = """
${
   "ip_addr_lookup": **[eth_hw_addr="500c.0000.1b08"]."eth_ip_addr",
   "all_macs": $.**.eth_hw_addr
}
"""


def do():
    expr = jsonata.Jsonata(jsonata_expr)
    result = expr.evaluate(input_data)
    return str(result)


time_taken = timeit(do, number=3)

print("Execution time:", time_taken)

Test #3 - Pure Python

import json
from timeit import timeit


def open_file(filename):
    """Open and read JSON data from a file."""
    with open(filename, "r") as file:
        return json.load(file)


def do():
    data = open_file("data.json")
    interfaces = data["result"]["body"]["TABLE_interface"]["ROW_interface"]

    eth_hw_addr_to_ip = {}
    all_eth_hw_addrs = []

    for iface in interfaces:
        eth_hw_addr = iface.get("eth_hw_addr")
        eth_ip_addr = iface.get("eth_ip_addr")

        if eth_hw_addr:
            all_eth_hw_addrs.append(eth_hw_addr)
            if eth_ip_addr:
                if eth_hw_addr not in eth_hw_addr_to_ip:
                    eth_hw_addr_to_ip[eth_hw_addr] = []
                eth_hw_addr_to_ip[eth_hw_addr].append(eth_ip_addr)

    return {
        "ip_addr_lookup": eth_hw_addr_to_ip.get("500c.0000.1b08", []),
        "all_macs": all_eth_hw_addrs,
    }


time_taken = timeit(do, number=3)

print("Execution time:", time_taken)

Testing Results

After running our tests, below shows the execution times for each:

Test Library Time
Test #1 jsonata 14.011395630999687
Test #2 jsonata-python 0.9429438649967778
Test #3 Pure Python 0.009014303999720141

Result Summary

The new JSONata library is significantly faster (1,554 times faster) than the previous Python library for working with JSONata. While pure Python is even quicker for parsing and transformation, jsonata-python simplifies your development by allowing you to use an expression language for your parsing and transformation instead of native Python code.

Therefore, if you need a tool to handle nested JSON, jsonata-python is an excellent choice.

Looking to Learn More?

Want to learn more about JSONata? If so, check out our members' tech session where we deep-dive into JSONata.

Tech Session Library ➜

Subscribe to our newsletter and stay updated.

Don't miss anything. Get all the latest posts delivered straight to your inbox.
Great! Check your inbox and click the link to confirm your subscription.
Error! Please enter a valid email address!