Reputation: 101
ITERTUPLES is a nice way to iterate over a pandas DF and it returns a namedtuple.
import pandas as pd
import numpy as np
df = pd.DataFrame({'num_legs': [4, 2], 'num_wings': [0, 2]},index=['dog', 'hawk'])
for row in df.itertuples():
print(type(row))
print(row)
<class 'pandas.core.frame.Pandas'>
Pandas(Index='dog', num_legs=4, num_wings=0)
<class 'pandas.core.frame.Pandas'>
Pandas(Index='hawk', num_legs=2, num_wings=2)
What is a correct way if any to add type hints to the returned namedtuples ?
Upvotes: 10
Views: 2158
Reputation: 331
One posible solution, in case the column names and data types are fixed, is to declare explicitly the data structure of the df row as a NamedTuple:
from typing import NamedTuple
import pandas as pd
class Row(NamedTuple):
num_legs: int
num_wings: int
data = {"num_legs": [4, 2], "num_wings": [0, 2]}
df = pd.DataFrame(data, index=["dog", "hawk"])
row: Row
for row in df.itertuples(name="Row"):
print(row.num_legs)
Upvotes: 2
Reputation: 1737
Here's a slightly modified version of the Bravhek's answer, but with type checking:
from typing import NamedTuple
import pandas as pd
from typing import get_type_hints
Row = NamedTuple(
"Animal",
[("Index", str), ("num_legs", int), ("num_wings", int)],
)
df = pd.DataFrame(
{"num_legs": [4, 2, 'a'], "num_wings": [0, 2, 3]}, index=["dog", "hawk", "bad_record"]
)
# Just a protocol type hint:
row: Row
for row in df.itertuples():
print(row.num_legs)
# Actual type checking:
if set(Row._fields) != set(df.columns.tolist()) | {'Index'}:
print('columns mismatch')
for row in df.itertuples():
for fn in Row._fields:
if not isinstance(getattr(row,fn), get_type_hints(Row)[fn]):
print('type mismatch in column "{}", row "{}"'.format(fn, row))
print(row.num_legs)
It prints the following:
4
2
a
4
2
type mismatch in column "num_legs", row "Pandas(Index='bad_record', num_legs='a', num_wings=3)"
a
The protocol type hint
part could be useful to silence IDE warnings (e.g. PyCharm "unresolved attribute reference"), but it would not validate anything.
Upvotes: 0
Reputation: 321
import pandas as pd
import numpy as np
df = pd.DataFrame({'num_legs': [4, 2], 'num_wings': [0, 2]},index=['dog', 'hawk'])
for row in df.itertuples():
print(type(row))
print(row)
You'll notice that the type is pandas.core.frame.Pandas
-- but this gives an error type checking. You'll need to type check for pd.core.frame.pandas
import pandas as pd
import numpy as np
def test_chk(row2chk: pd.core.frame.pandas):
print(row2chk)
print(row2chk.num_legs) # prints the value in the num_legs column
df = pd.DataFrame({'num_legs': [4, 2], 'num_wings': [0, 2]},index=['dog', 'hawk'])
for row in df.itertuples():
print(type(row))
test_chk(row)
Upvotes: -1
Reputation:
I don't think its possible, because your dataframe can have any arbitrary data type, and thus the tuples will have any arbitrary data type present in the dataframe. In the same way you can't use Python type hints to specify the column types of a DataFrame, you can't explicitly type those named tuples.
If you need the type information of the columns before going into your for loop, you can certainly use df.dtypes
, which gives you a Series with the column types.
Upvotes: 2