Reputation: 8009
PEP-557 introduced data classes into Python standard library, that basically can fill the same role as collections.namedtuple
and typing.NamedTuple
. And now I'm wondering how to separate the use cases in which namedtuple is still a better solution.
Of course, all the credit goes to dataclass
if we need:
property
decorators, manageable attributesData classes advantages are briefly explained in the same PEP: Why not just use namedtuple.
But how about an opposite question for namedtuples: why not just use dataclass? I guess probably namedtuple is better from the performance standpoint but found no confirmation on that yet.
Let's consider the following situation:
We are going to store pages dimensions in a small container with statically defined fields, type hinting and named access. No further hashing, comparing and so on are needed.
NamedTuple approach:
from typing import NamedTuple
PageDimensions = NamedTuple("PageDimensions", [('width', int), ('height', int)])
DataClass approach:
from dataclasses import dataclass
@dataclass
class PageDimensions:
width: int
height: int
Which solution is preferable and why?
P.S. the question isn't a duplicate of that one in any way, because here I'm asking about the cases in which namedtuple is better, not about the difference (I've checked docs and sources before asking)
Upvotes: 288
Views: 116362
Reputation: 17546
adding to @oleksandr-yarushevskyi 's answer, I tried to compare between
typing.NamedTuple
, @dataclass
, and @dataclasse(frozen=True, slots=True)
.
I used this code:
import sys
import timeit
from dataclasses import dataclass
from typing import NamedTuple
@dataclass
class PageDimensions002:
a0: int
a1: int
@dataclass(frozen=True, slots=True)
class PageDimensions002s:
a0: int
a1: int
@dataclass
class PageDimensions100:
a0: int
a1: int
# ... truncated for brevity
a98: int
a99: int
@dataclass(frozen=True, slots=True)
class PageDimensions100s:
a0: int
a1: int
# ... truncated for brevity
a98: int
a99: int
for n, dc, dcs in [
(2, PageDimensions002, PageDimensions002s),
(100, PageDimensions100, PageDimensions100s),
]:
print(f"\nNamedTuple, {n} attributes")
PageDimensions = NamedTuple("PageDimensions", [(f"a{i}", int) for i in range(n)])
a = PageDimensions(*range(n))
print("size[bytes]=", sys.getsizeof(a))
times = timeit.repeat("a.a1", globals=globals())
print("time[ns]=", [int(t*1000) for t in times])
print(f"\nDataclass, {n} attributes")
a = dc(*range(n))
print("size[bytes]=", sys.getsizeof(a) + sys.getsizeof(vars(a)))
times = timeit.repeat("a.a1", globals=globals())
print("time[ns]=", [int(t*1000) for t in times])
print(f"\nDataclass(frozen, slots), {n} attributes")
a = dcs(*range(n))
print("size[bytes]=", sys.getsizeof(a))
times = timeit.repeat("a.a1", globals=globals())
print("time[ns]=", [int(t*1000) for t in times])
And got these results:
NamedTuple, 2 attributes
size[bytes]= 56
time[ns]= [29, 29, 29, 28, 28]
Dataclass, 2 attributes
size[bytes]= 344
time[ns]= [26, 23, 23, 24, 23]
Dataclass(frozen, slots), 2 attributes
size[bytes]= 48
time[ns]= [25, 25, 23, 23, 23]
NamedTuple, 100 attributes
size[bytes]= 840
time[ns]= [30, 28, 28, 28, 30]
Dataclass, 100 attributes
size[bytes]= 3376
time[ns]= [23, 22, 21, 21, 20]
Dataclass(frozen, slots), 100 attributes
size[bytes]= 832
time[ns]= [21, 22, 19, 18, 18]
we can see that @dataclass(frozen=True, slots=True)
is the smallest and fastest, on 2 and 100 attributes.
Upvotes: 0
Reputation: 11627
There is another small difference between them not mentioned so far. The attributes of named tuples can be accessed by their names and indexes, while the attributes of data classes only by their attribute names. I ran into this difference when sorting list of objects.
For named tuples, we can use both the itemgetter
and attrgetter
helper functions. For data classes, we can use only the attrgetter
function.
#!/usr/bin/python
from typing import NamedTuple
from operator import itemgetter, attrgetter
# from dataclasses import dataclass
# @dataclass(frozen=True)
# class City:
# cid: int
# name: str
# population: int
class City(NamedTuple):
cid: int
name: str
population: int
c1 = City(1, 'Bratislava', 432000)
c2 = City(2, 'Budapest', 1759000)
c3 = City(3, 'Prague', 1280000)
c4 = City(4, 'Warsaw', 1748000)
c5 = City(5, 'Los Angeles', 3971000)
c6 = City(6, 'Edinburgh', 464000)
c7 = City(7, 'Berlin', 3671000)
cities = [c1, c2, c3, c4, c5, c6, c7]
sorted_cities = sorted(cities, key=attrgetter('name'))
for city in sorted_cities:
print(city)
print('---------------------')
sorted_cities = sorted(cities, key=itemgetter(2))
for city in sorted_cities:
print(city)
Upvotes: 3
Reputation: 1499
I had this same question, so ran a few tests and documented them here: https://shayallenhill.com/python-struct-options/
Summary:
To do this, define a type inheriting from it...
from typing import NamedTuple
class CircleArg(NamedTuple):
x: float
y: float
radius: float
...then unpack it inside your functions. Don't use the .attributes
, and you'll have a nice "type hint" without any PITA for the caller.
*focus, radius = circle_arg_instance # or tuple
Upvotes: 42
Reputation: 591
I didn't see any of the other answers mention it, but in my opinion one of the most important differences is to do with how equality and comparison work. When you compare named tuples, the names are ignored: two named tuples are equal if they contain the same values in the same order, even if they have different class names or field names:
>>> from collections import namedtuple
>>> A = namedtuple('A', ())
>>> B = namedtuple('B', ())
>>> a = A()
>>> b = B()
>>> a == b
True
Dataclasse instances, on the other hand, will only be considered equal if they are of the same type. I pretty much always want the latter behaviour: I expect things of different types to be distinct.
Upvotes: 49
Reputation: 17850
Another important limitation to NamedTuple
is that it cannot be generic:
import typing as t
T=t.TypeVar('T')
class C(t.Generic[T], t.NamedTuple): ...
TypeError: Multiple inheritance with NamedTuple is not supported
Upvotes: 14
Reputation: 8412
One usecase for me is frameworks that do not support dataclasses
. In particular, TensorFlow. There, a tf.function
can work with a typing.NamedTuple
but not with a dataclass
.
class MyFancyData(typing.NamedTuple):
some_tensor: tf.Tensor
some_other_stuf: ...
@tf.function
def train_step(self, my_fancy_data: MyFancyData):
...
Upvotes: 9
Reputation: 3279
It depends on your needs. Each of them has own benefits.
Here is a good explanation of Dataclasses on PyCon 2018 Raymond Hettinger - Dataclasses: The code generator to end all code generators
In Dataclass
all implementation is written in Python, whereas in NamedTuple
, all of these behaviors come for free because NamedTuple
inherits from tuple
. And because the tuple
structure is written in C, standard methods are faster in NamedTuple
(hash, comparing and etc).
Note also that Dataclass
is based on dict
whereas NamedTuple
is based on tuple
. Thus, you have advantages and disadvantages of using these structures. For example, space usage is less with a NamedTuple
, but time access is faster with a Dataclass
.
Please, see my experiment:
In [33]: a = PageDimensionsDC(width=10, height=10)
In [34]: sys.getsizeof(a) + sys.getsizeof(vars(a))
Out[34]: 168
In [35]: %timeit a.width
43.2 ns ± 1.05 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [36]: a = PageDimensionsNT(width=10, height=10)
In [37]: sys.getsizeof(a)
Out[37]: 64
In [38]: %timeit a.width
63.6 ns ± 1.33 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
But with increasing the number of attributes of NamedTuple
access time remains the same small, because for each attribute it creates a property with the name of the attribute. For example, for our case the part of the namespace of the new class will look like:
from operator import itemgetter
class_namespace = {
...
'width': property(itemgetter(0, doc="Alias for field number 0")),
'height': property(itemgetter(0, doc="Alias for field number 1"))**
}
In which cases namedtuple is still a better choice?
When your data structure needs to/can be immutable, hashable, iterable, unpackable, comparable then you can use NamedTuple
. If you need something more complicated, for example, a possibility of inheritance for your data structure then use Dataclass
.
Upvotes: 209
Reputation: 1784
In programming in general, anything that CAN be immutable SHOULD be immutable. We gain two things:
That's why, if the data is immutable, you should use a named tuple instead of a dataclass
I wrote it in the comment, but I'll mention it here:
You're definitely right that there is an overlap, especially with frozen=True
in dataclasses- but there are still features such as unpacking belonging to namedtuples, and it always being immutable- I doubt they'll remove namedtuples as such
Upvotes: 49