Reputation: 1582
I need to iterate over a pandas dataframe in order to pass each row as argument of a function (actually, class constructor) with **kwargs
. This means that each row should behave as a dictionary with keys the column names and values the corresponding ones for each row.
This works, but it performs very badly:
import pandas as pd
def myfunc(**kwargs):
try:
area = kwargs.get('length', 0)* kwargs.get('width', 0)
return area
except TypeError:
return 'Error : length and width should be int or float'
df = pd.DataFrame({'length':[1,2,3], 'width':[10, 20, 30]})
for i in range(len(df)):
print myfunc(**df.iloc[i])
Any suggestions on how to make that more performing ? I have tried iterating with tried df.iterrows()
,
but I get the following error :
TypeError: myfunc() argument after ** must be a mapping, not tuple
I have also tried df.itertuples()
and df.values
, but either I am missing something, or it means that I have to convert each tuple / np.array to a pd.Series or dict , which will also be slow.
My constraint is that the script has to work with python 2.7 and pandas 0.14.1.
Upvotes: 37
Views: 63972
Reputation: 2646
one clean option is this one:
for row_dict in df.to_dict(orient="records"):
print(row_dict['column_name'])
Upvotes: 84
Reputation: 164773
Defining a separate function for this will be inefficient, as you are applying row-wise calculations. More efficient would be to calculate a new series, then iterate the series:
df = pd.DataFrame({'length':[1,2,3,'test'], 'width':[10, 20, 30,'hello']})
df2 = df.iloc[:].apply(pd.to_numeric, errors='coerce')
error_str = 'Error : length and width should be int or float'
print(*(df2['length'] * df2['width']).fillna(error_str), sep='\n')
10.0
40.0
90.0
Error : length and width should be int or float
Upvotes: 1
Reputation: 5622
You can try:
for k, row in df.iterrows():
myfunc(**row)
Here k
is the dataframe index and row
is a dict, so you can access any column with: row["my_column_name"]
Upvotes: 26