What is JSONata?
Dealing with nested JSON can be challenging—looping, complex key lookup chains, etc. Though there are various options out in the community for easing this pain, the one project I always enjoyed was JSONata. It's easy to use and has strong transformation features—something lacking in libraries such as JMESPath.
However, because this project was written in JavaScript, the libraries for working with JSONata in Python were still based on JavaScript bindings. This meant slow performance due to the required Python-to-JavaScript overhead.
Intro to jsonata-python
Thankfully, a few months back, a new project, jsonata-python, was released, a pure Python implementation of JSONata (link below).
The next question is:
How much faster is jsonata-python?
Testing Overview
To test the new library, we will test the transformation and parsing of a 11k-line JSON document using the following:
- Test #1 -
jsonata
(JSONata Python library - JavaScript bindings) - Test #2 -
jsonata-python
(JSONata Python library - Pure Python) - Test #3 - Pure Python
The Tests
Test Data
As previously discussed, our testing will involve a JSON file containing 11,000 lines of data. This JSON data was originally extracted from an NXOS device and then duplicated to increase the file length. Below is a snippet for reference:
{
"jsonrpc": "2.0",
"result": {
"body": {
"TABLE_interface": {
"ROW_interface": [
{
"interface": "mgmt0",
"state": "up",
"admin_state": "up",
"eth_hw_desc": "Ethernet",
"eth_hw_addr": "5000.000c.0000",
"eth_bia_addr": "5000.000c.0000",
"eth_ip_addr": "172.29.151.2",
"eth_ip_mask": 24,
"eth_ip_prefix": "172.29.151.0",
"eth_mtu": "1500",
"eth_bw": 1000000,
"eth_bw_str": "1000000 Kbit",
"eth_dly": 10,
"eth_reliability": "255",
"eth_txload": "1",
"eth_rxload": "1",
"encapsulation": "ARPA",
"medium": "broadcast",
"eth_duplex": "full",
"eth_speed": "1000 Mb/s",
"eth_autoneg": "on",
"eth_mdix": "off",
"eth_ethertype": "0x0000",
"vdc_lvl_in_avg_bits": 912,
"vdc_lvl_in_avg_pkts": "1",
"vdc_lvl_out_avg_bits": "32",
"vdc_lvl_out_avg_pkts": "0",
"vdc_lvl_in_pkts": 7201330,
"vdc_lvl_in_ucast": "1717090",
"vdc_lvl_in_mcast": "246484",
"vdc_lvl_in_bcast": "5237756",
"vdc_lvl_in_bytes": "592067893",
"vdc_lvl_out_pkts": "1434511",
"vdc_lvl_out_ucast": "1362476",
"vdc_lvl_out_mcast": "72027",
"vdc_lvl_out_bcast": "8",
"vdc_lvl_out_bytes": "242896910"
},
{
"interface": "Ethernet1/1",
"state": "up",
"admin_state": "up",
"share_state": "Dedicated",
"eth_hw_desc": "100/1000/10000 Ethernet",
"eth_hw_addr": "500c.0000.1b08",
"eth_bia_addr": "500c.0000.0101",
"desc": "// Connected to leaf-1",
"eth_ip_addr": "10.2.1.1",
"eth_ip_mask": 30,
"eth_ip_prefix": "10.2.1.0",
"eth_mtu": "1500",
"eth_bw": 1000000,
"eth_bw_str": "1000000 Kbit",
"eth_dly": 10,
"eth_reliability": "254",
"eth_txload": "1",
"eth_rxload": "1",
"encapsulation": "ARPA",
"medium": "broadcast",
"eth_duplex": "full",
"eth_speed": "1000 Mb/s",
"eth_beacon": "off",
"eth_autoneg": "on",
"eth_in_flowctrl": "off",
"eth_out_flowctrl": "off",
"eth_mdix": "off",
"eth_swt_monitor": "off",
"eth_ethertype": "0x8100",
"eth_eee_state": "n/a",
"eth_admin_fec_state": "auto",
"eth_oper_fec_state": "off",
"eth_link_flapped": "2week(s) 4day(s)",
"eth_clear_counters": "never",
"eth_reset_cntr": 1,
"eth_load_interval1_rx": 30,
"eth_inrate1_bits": "456",
"eth_inrate1_pkts": "0",
"eth_load_interval1_tx": "30",
"eth_outrate1_bits": "232",
"eth_outrate1_pkts": "9406384245761979392",
"eth_inrate1_summary_bits": "456 bps",
"eth_inrate1_summary_pkts": "0 pps",
"eth_outrate1_summary_bits": "232 bps",
"eth_outrate1_summary_pkts": "9406384128.00 Gpps",
"eth_load_interval2_rx": "300",
"eth_inrate2_bits": "624",
"eth_inrate2_pkts": "0",
"eth_load_interval2_tx": "300",
"eth_outrate2_bits": "528",
"eth_outrate2_pkts": "9532221731101444352",
"eth_inrate2_summary_bits": "624 bps",
"eth_inrate2_summary_pkts": "0 pps",
"eth_outrate2_summary_bits": "528 bps",
"eth_outrate2_summary_pkts": "9532221440.00 Gpps",
"eth_inucast": 287801,
"eth_inmcast": 183618442710906,
"eth_inbcast": 17796917312826114000,
"eth_inpkts": 17797100931269112000,
"eth_inbytes": 137807247731,
"eth_jumbo_inpkts": "17197692975229790875",
"eth_storm_supp": "0",
"eth_runts": 0,
"eth_giants": 0,
"eth_crc": "0",
"eth_nobuf": 0,
"eth_inerr": "18254342487522858739",
"eth_frame": "0",
"eth_overrun": "0",
...
Test Expression
The expression we will use for our tests will transform our data into a new JSON object. Our new JSON object will consist of 2 keys:
ip_addr_lookup
- Contains the IPs for the interfaces matching a given MAC address.all_macs
- All MAC addresses from each of the interfaces.
Here is our full JSONata expression:
{
"ip_addr_lookup": **[eth_hw_addr="500c.0000.1b08"]."eth_ip_addr",
"all_macs": $.**.eth_hw_addr
}
Test Code
Here is the code used for each of the test cases:
Test #1 - jsonata
import json
import timeit
def open_file(file):
with open(file) as f:
data = f.read()
return data
import jsonata
jncontext = jsonata.Context()
input_data = open_file(file="data.json")
jsonata_expr = """{
"ip_addr_lookup": **[eth_hw_addr="500c.0000.1b08"]."eth_ip_addr",
"all_macs": $.**.eth_hw_addr
}""".replace(
"\n", ""
)
def do():
result = jncontext(jsonata_expr, json.loads(input_data))
return str(result)
time_taken = timeit.timeit(do, number=3)
print("Execution time:", time_taken)
Test #2 - jsonata-python
from timeit import timeit
import jsonata
def open_file(file):
with open(file) as f:
data = f.read()
input_data = open_file(file="data.json")
input_data = json.loads(input_data)
jsonata_expr = """
${
"ip_addr_lookup": **[eth_hw_addr="500c.0000.1b08"]."eth_ip_addr",
"all_macs": $.**.eth_hw_addr
}
"""
def do():
expr = jsonata.Jsonata(jsonata_expr)
result = expr.evaluate(input_data)
return str(result)
time_taken = timeit(do, number=3)
print("Execution time:", time_taken)
Test #3 - Pure Python
import json
from timeit import timeit
def open_file(filename):
"""Open and read JSON data from a file."""
with open(filename, "r") as file:
return json.load(file)
def do():
data = open_file("data.json")
interfaces = data["result"]["body"]["TABLE_interface"]["ROW_interface"]
eth_hw_addr_to_ip = {}
all_eth_hw_addrs = []
for iface in interfaces:
eth_hw_addr = iface.get("eth_hw_addr")
eth_ip_addr = iface.get("eth_ip_addr")
if eth_hw_addr:
all_eth_hw_addrs.append(eth_hw_addr)
if eth_ip_addr:
if eth_hw_addr not in eth_hw_addr_to_ip:
eth_hw_addr_to_ip[eth_hw_addr] = []
eth_hw_addr_to_ip[eth_hw_addr].append(eth_ip_addr)
return {
"ip_addr_lookup": eth_hw_addr_to_ip.get("500c.0000.1b08", []),
"all_macs": all_eth_hw_addrs,
}
time_taken = timeit(do, number=3)
print("Execution time:", time_taken)
Testing Results
After running our tests, below shows the execution times for each:
Test | Library | Time |
Test #1 | jsonata | 14.011395630999687 |
Test #2 | jsonata-python | 0.9429438649967778 |
Test #3 | Pure Python | 0.009014303999720141 |
Result Summary
The new JSONata library is significantly faster (1,554 times faster) than the previous Python library for working with JSONata. While pure Python is even quicker for parsing and transformation, jsonata-python simplifies your development by allowing you to use an expression language for your parsing and transformation instead of native Python code.
Therefore, if you need a tool to handle nested JSON, jsonata-python is an excellent choice.
Looking to Learn More?
Want to learn more about JSONata? If so, check out our members' tech session where we deep-dive into JSONata.
Tech Session Library ➜