python: iteration through a grouped dataframe

Question

this dataframe has been reordered to assure that the categories in column 'x' are grouped together with column 'y' in an ascending order. once I performed these changes, I am trying to sum column 'y' and 'z' for every category 'x' so that each row is always summed to the previous rows within the same category. The output displays the correct result but I am now trying to find a way to simplify the last 12 lines instead of writing a line for each row of the dataframe.

import numpy as np
import pandas as pd
data=[['A', 1,2 ],
        ['B', 5,5],
        ['C', 4,6]
        ,['A', 6,5],
        ['B',9,9],
        ['C', 7,9]
        ,['A', 2,3],
        ['B', 5,1],
        ['C',2,10]
        ,['B', 8,2],
        ['B', 5,4],
        ['C', 8,5 ]]
df = pd.DataFrame(data, columns=['x','y','z'])
df = df.sort_values(by =['x', 'y'], ascending=True)
pd.set_option('display.max_rows',14)
print(df.head(14))
print (sum(df.iloc[0:1]['y'] + df.iloc[0:1]['z']))
print (sum(df.iloc[0:2]['y'] + df.iloc[0:2]['z']))
print (sum(df.iloc[0:3]['y'] + df.iloc[0:3]['z']))

print (sum(df.iloc[3:4]['y'] + df.iloc[3:4]['z']))
print (sum(df.iloc[3:5]['y'] + df.iloc[3:5]['z']))
print (sum(df.iloc[3:6]['y'] + df.iloc[3:6]['z']))
print (sum(df.iloc[3:7]['y'] + df.iloc[3:7]['z']))
print (sum(df.iloc[3:8]['y'] + df.iloc[3:8]['z']))

print (sum(df.iloc[8:9]['y'] + df.iloc[8:9]['z']))
print (sum(df.iloc[8:10]['y'] + df.iloc[8:10]['z']))      
print (sum(df.iloc[8:11]['y'] + df.iloc[8:11]['z']))
print (sum(df.iloc[8:12]['y'] + df.iloc[8:12]['z']))

ifly6 · Accepted Answer

I think what you're looking for is something like:

>>> df.groupby('x')[['y', 'z']].cumsum().eval('sum_yz = y + z')

     y   z  sum_yz
0    1   2       3
6    3   5       8
3    9  10      19
1    5   5      10
7   10   6      16
10  15  10      25
9   23  12      35
4   32  21      53
8    2  10      12
2    6  16      22
5   13  25      38
11  21  30      51

You can make it simpler to understand by having a second line assigning a new column as df['sum_yz'] = df['y'] + df['z'] but df.eval works here. If you want to return this with the rest of the data, I would drop the y and z columns in the new data frame, then pd.concat passing axis=1 with the original data frame.

If you just want the final sums themselves, that is even easier:

>>> df.groupby('x')[['y', 'z']].sum().eval('sum_yz = y + z')

    y   z  sum_yz
x                
A   9  10      19
B  32  21      53
C  21  30      51

python: iteration through a grouped dataframe

Answers (2)

Related Questions