avoid for loop for pandas dataframe to calculate each id python

Question

I am working with python in a pandas dataframe in where I have to do some calculations:

As you can see in those images, I have a lot of data with different id. What I need to do is to calculate per each id different operations, so what I am doing right now is this:

array_id_ad_hs = df['column_id'].unique()
for id in array_id_ad_hs: 
    df_history = df[df['column_id']==id]
    df_history['new_column'] = 1000 -  df_history['temporary_sum'].cumsum()

There is a better/faster way to do this operations?

Daniel R · Accepted Answer

You can use groupby

import pandas as pd
import numpy as np
df = pd.DataFrame({'id': np.repeat(['A', 'B', 'C'], 5), 'x': np.random.normal(300, 100, 15)})
df['y'] = 1000 +  df.groupby('id')['x'].cumsum()
print(df)

outputs

   id           x            y
0   A  265.331439  1265.331439
1   A  392.658450  1657.989889
2   A  223.808512  1881.798401
3   A  209.223416  2091.021817
4   A  253.292921  2344.314738
5   B  425.387435  1425.387435
6   B  171.922392  1597.309827
7   B  198.998873  1796.308699
8   B  168.298701  1964.607401
9   B  347.075096  2311.682497
10  C  374.944209  1374.944209
11  C  310.802718  1685.746927
12  C  361.621695  2047.368623
13  C  250.134388  2297.503011
14  C  294.190045  2591.693056

avoid for loop for pandas dataframe to calculate each id python

Answers (1)

Related Questions