Data Classes, introduced within Python 3.7, are classes used to (though not limited to) store data. They eliminate much of the boiler-plate code that is typically required when creating Python based classes.
Decorator
A Data Class is created via the use of the @dataclass
decorator. Like so:
from dataclasses import dataclass
@dataclass
class Interface:
name: str
speed: int
mtu: int
Special Methods
As we first mentioned, data classes reduce the amount of boiler plate code required when writing classes. By default a data class will generate the special methods __init__
, __repr__
, __str__
and __eq__
for you.
For example if we instantiate instances from the Interface
data class, we can inspect the instances attributes, compare our instances, and also see a representation of our instance, all out of the box.
>>> interface1 = Interface(name='Ethernet1/1',speed=1000,mtu=1500)
>>> interface2 = Interface(name='Ethernet1/2',speed=1000,mtu=1500)
>>> interface1.name
'Ethernet1/1'
>>> print(interface1)
Interface(name='Ethernet1/1', speed=1000, mtu=1500)
>>> interface1
Interface(name='Ethernet1/1', speed=1000, mtu=1500)
>>> interface1 == interface2
False
Normally we would have had to write the following (special classes) to achieve the same functionality.
class Interface:
def __init__(self, name: str, speed: int, mtu: int) -> None:
self.name = name
self.speed = speed
self.mtu = mtu
def __str__(self):
...
def __repr__(self):
...
def __eq__(self, other):
...
Default Values
Much like how default values are added to __init__
, data classes allow for default values to be added to your fields, like so.
from dataclasses import dataclass
@dataclass
class Interface:
name: str
speed: int = 1000
mtu: int = 1500
>>> Interface(name='Ethernet1/1')
Interface(name='Ethernet1/1', speed=1000, mtu=1500)
Inheritance
Much like regular classes, data classes also support inheritance via subclassing. Below shows an example.
from dataclasses import dataclass
@dataclass
class Interface:
name: str
speed: int = 1000
mtu: int = 1500
@dataclass
class SVI(Interface):
vlan: int = None
>>> SVI(name='Vlan100',vlan=100)
SVI(name='Vlan100', speed=1000, mtu=1500, vlan=100)
Dictionary Conversion
Data Classes also provide a built in function -- asdict()
-- that converts the data class instance to a dict. Below shows an example:
from dataclasses import asdict
svi = SVI(name='Vlan100',vlan=100)
>>> asdict(svi)
{'name': 'Vlan100', 'speed': 1000, 'mtu': 1500, 'vlan': 100}
Type Hints
Python is a dynamically typed language, meaning that you do not have to declare the variable type when assigning a value to it. However, this can present issues. For example, when your program receives data of a certain type that wasn't accounted for this can have unexpected results.
Python provides a feature called Type Hinting which allows you to provide (yep you guessed it) a hint to Python of what the type should be. In order to check for type errors, a type checker such as Mypy is required.
Let's look at a quick example. First we create a data class and instantiate an instance with a type error.
$ cat interface_dataclass.py
from dataclasses import dataclass
@dataclass
class Interface:
name: str
speed: int = 1000
mtu: int = 1500
Interface(name='Ethernet1/1', speed="1000GB", mtu=1500)
We can then run the Mypy type checker against our file, which alerts us to the type error.
$ mypy interface_dataclass.py
interface_dataclass.py:9: error: Argument "speed" to "Interface" has incompatible type "str"; expected "int"
Found 1 error in 1 file (checked 1 source file)
Customizing Fields
The core type in dataclasses is the Field
type. By default, just setting a class attribute will instantiate a Field on your class as shown in previous examples.[1]
To customize the behaviour on your data class field, dataclasses provide a number of field()
parameters. Within the scope of this article we will cover 2 of these parameters - metadata
and default_factory
. Full details on all of the available parameters can be found at https://docs.python.org/3/library/dataclasses.html#dataclasses.field.
Meta Data
To attach additional information to the field we can use metadata
. Like so:
from dataclasses import dataclass, field
@dataclass
class Interface:
name: str
speed: int = field(default=1000, metadata={'unit': 'megabits'})
mtu: int = field(default=1500, metadata={'unit': 'bytes'})
To retrieve our metadata information, along with other field information, we use the fields()
function.
>>> int = Interface(name='Ethernet1/1', speed=1000, mtu=1500)
>>> fields(int)[2].metadata['unit']
'bytes'
Default Factory
The default_factory
parameter allows you to provide a zero-argument callable that will be called when a default value is needed for this field.
Let's look at an example. Here will use a default_factory
to build a list of Interface
objects whenever we instantiate the Device
data class.
First we define our Interface data class, much like we have done in previous examples.
from dataclasses import dataclass, field
@dataclass
class Interface:
name: str
speed: int = field(default=1000)
mtu: int = field(default=1500)
Next we will create our zero-argument callable (aka default_factory) which will build our Interface
objects via a list comprehension.
def build_interfaces():
return [Interface(f'Ethernet1/{int}') for int in range(0,23)]
This default factory is then referenced within our interfaces
field.
from typing import List
@dataclass
class Device:
name: str
vendor: str
model: str
interfaces: List[Interface] = field(default_factory=build_interfaces)
Now when we create an instance of Device
, our default_factory
is called and the interface objects created accordingly.
>>> Device(name='rtr001',vendor='Cisco',model='Nexus9372')
Device(name='rtr001', vendor='Cisco', model='Nexus9372', interfaces=[Interface(name='Ethernet1/0', speed=1000, mtu=1500), Interface(name='Ethernet1/1', speed=1000, mtu=1500), Interface(name='Ethernet1/2', speed=1000, mtu=1500), Interface(name='Ethernet1/3', speed=1000, mtu=1500), Interface(name='Ethernet1/4', speed=1000, mtu=1500),...
Post-Init
Finally, we have the ability to add functionality to your data class that will run after the auto-generated __init__
, via the __post_init__
special method. Like so:
@dataclass
class Device:
name: str
vendor: str
model: str
def __post_init__(self):
print("Device added to CMDB")
>>> device = Device(name='sw001', vendor='Cisco',model='Nexus9372')
Device added to CMDB
References
"A brief tour of Python 3.7 data classes | Hacker Noon." 21 Jan. 2018, https://hackernoon.com/a-brief-tour-of-python-3-7-data-classes-22ee5e046517. Accessed 4 Oct. 2020. ↩︎