d8a988
d8a988

Reputation: 71

python: iteration through a grouped dataframe

this dataframe has been reordered to assure that the categories in column 'x' are grouped together with column 'y' in an ascending order. once I performed these changes, I am trying to sum column 'y' and 'z' for every category 'x' so that each row is always summed to the previous rows within the same category. The output displays the correct result but I am now trying to find a way to simplify the last 12 lines instead of writing a line for each row of the dataframe.

import numpy as np
import pandas as pd
data=[['A', 1,2 ],
        ['B', 5,5],
        ['C', 4,6]
        ,['A', 6,5],
        ['B',9,9],
        ['C', 7,9]
        ,['A', 2,3],
        ['B', 5,1],
        ['C',2,10]
        ,['B', 8,2],
        ['B', 5,4],
        ['C', 8,5 ]]
df = pd.DataFrame(data, columns=['x','y','z'])
df = df.sort_values(by =['x', 'y'], ascending=True)
pd.set_option('display.max_rows',14)
print(df.head(14))
print (sum(df.iloc[0:1]['y'] + df.iloc[0:1]['z']))
print (sum(df.iloc[0:2]['y'] + df.iloc[0:2]['z']))
print (sum(df.iloc[0:3]['y'] + df.iloc[0:3]['z']))

print (sum(df.iloc[3:4]['y'] + df.iloc[3:4]['z']))
print (sum(df.iloc[3:5]['y'] + df.iloc[3:5]['z']))
print (sum(df.iloc[3:6]['y'] + df.iloc[3:6]['z']))
print (sum(df.iloc[3:7]['y'] + df.iloc[3:7]['z']))
print (sum(df.iloc[3:8]['y'] + df.iloc[3:8]['z']))

print (sum(df.iloc[8:9]['y'] + df.iloc[8:9]['z']))
print (sum(df.iloc[8:10]['y'] + df.iloc[8:10]['z']))      
print (sum(df.iloc[8:11]['y'] + df.iloc[8:11]['z']))
print (sum(df.iloc[8:12]['y'] + df.iloc[8:12]['z']))

Upvotes: 1

Views: 59

Answers (2)

ifly6
ifly6

Reputation: 5331

I think what you're looking for is something like:

>>> df.groupby('x')[['y', 'z']].cumsum().eval('sum_yz = y + z')

     y   z  sum_yz
0    1   2       3
6    3   5       8
3    9  10      19
1    5   5      10
7   10   6      16
10  15  10      25
9   23  12      35
4   32  21      53
8    2  10      12
2    6  16      22
5   13  25      38
11  21  30      51

You can make it simpler to understand by having a second line assigning a new column as df['sum_yz'] = df['y'] + df['z'] but df.eval works here. If you want to return this with the rest of the data, I would drop the y and z columns in the new data frame, then pd.concat passing axis=1 with the original data frame.

If you just want the final sums themselves, that is even easier:

>>> df.groupby('x')[['y', 'z']].sum().eval('sum_yz = y + z')

    y   z  sum_yz
x                
A   9  10      19
B  32  21      53
C  21  30      51

Upvotes: 2

Corralien
Corralien

Reputation: 120399

Use cumsum and sum the result along the columns:

>>> df.groupby('x').cumsum().sum(axis=1)
0      3
6      8
3     19
1     10
7     16
10    25
9     35
4     53
8     12
2     22
5     38
11    51
dtype: int64

Upvotes: 3

Related Questions