Reputation: 11
I trying to build a data frame based on another one. In order to build the second one, I need to loop over the first data frame and make some changes to the data and insert it in the second one. I am using a namedTuple for my for loop.
This loop is taking a lot of time to process 2m rows of data. Is there any fastest way to do this?
Upvotes: 0
Views: 1682
Reputation: 630
I'd recommend using the iterrows function that is built into pandas.
data = {'Name': ['John', 'Paul', 'George'], 'Age': [20, 21, 19]}
db = pd.DataFrame(data)
print(f"Dataframe:\n{db}\n")
for row, col in db.iterrows():
print(f"Row Index:{row}")
print(f"Column:\n{col}\n")
The output of the above:
Dataframe:
Name Age
0 John 20
1 Paul 21
2 George 19
Row Index:0
Column:
Name John
Age 20
Name: 0, dtype: object
Row Index:1
Column:
Name Paul
Age 21
Name: 1, dtype: object
Row Index:2
Column:
Name George
Age 19
Name: 2, dtype: object
Upvotes: 0
Reputation: 116
Since usually pandas dataframe were built on columns, it seems that it cannot provide a way to iterate through lines. However, This is the way I use for processing each row from the pandas dataframe:
rows = zip(*(table.loc[:, each] for each in table))
for rowNum, record in enumerate(rows):
# If you want to process record, modify the code to process here:
# Otherwise can just print each row
print("Row", rowNum, "records: ", record)
Btw, I still suggest you to look for some pandas methods that can help you process your first dataframe - usually will be quicker and more effective than you write your own. Wish this could help.
Upvotes: 1