Reputation: 71
this dataframe has been reordered to assure that the categories in column 'x' are grouped together with column 'y' in an ascending order. once I performed these changes, I am trying to sum column 'y' and 'z' for every category 'x' so that each row is always summed to the previous rows within the same category. The output displays the correct result but I am now trying to find a way to simplify the last 12 lines instead of writing a line for each row of the dataframe.
import numpy as np
import pandas as pd
data=[['A', 1,2 ],
['B', 5,5],
['C', 4,6]
,['A', 6,5],
['B',9,9],
['C', 7,9]
,['A', 2,3],
['B', 5,1],
['C',2,10]
,['B', 8,2],
['B', 5,4],
['C', 8,5 ]]
df = pd.DataFrame(data, columns=['x','y','z'])
df = df.sort_values(by =['x', 'y'], ascending=True)
pd.set_option('display.max_rows',14)
print(df.head(14))
print (sum(df.iloc[0:1]['y'] + df.iloc[0:1]['z']))
print (sum(df.iloc[0:2]['y'] + df.iloc[0:2]['z']))
print (sum(df.iloc[0:3]['y'] + df.iloc[0:3]['z']))
print (sum(df.iloc[3:4]['y'] + df.iloc[3:4]['z']))
print (sum(df.iloc[3:5]['y'] + df.iloc[3:5]['z']))
print (sum(df.iloc[3:6]['y'] + df.iloc[3:6]['z']))
print (sum(df.iloc[3:7]['y'] + df.iloc[3:7]['z']))
print (sum(df.iloc[3:8]['y'] + df.iloc[3:8]['z']))
print (sum(df.iloc[8:9]['y'] + df.iloc[8:9]['z']))
print (sum(df.iloc[8:10]['y'] + df.iloc[8:10]['z']))
print (sum(df.iloc[8:11]['y'] + df.iloc[8:11]['z']))
print (sum(df.iloc[8:12]['y'] + df.iloc[8:12]['z']))
Upvotes: 1
Views: 59
Reputation: 5331
I think what you're looking for is something like:
>>> df.groupby('x')[['y', 'z']].cumsum().eval('sum_yz = y + z')
y z sum_yz
0 1 2 3
6 3 5 8
3 9 10 19
1 5 5 10
7 10 6 16
10 15 10 25
9 23 12 35
4 32 21 53
8 2 10 12
2 6 16 22
5 13 25 38
11 21 30 51
You can make it simpler to understand by having a second line assigning a new column as df['sum_yz'] = df['y'] + df['z']
but df.eval
works here. If you want to return this with the rest of the data, I would drop the y
and z
columns in the new data frame, then pd.concat
passing axis=1
with the original data frame.
If you just want the final sums themselves, that is even easier:
>>> df.groupby('x')[['y', 'z']].sum().eval('sum_yz = y + z')
y z sum_yz
x
A 9 10 19
B 32 21 53
C 21 30 51
Upvotes: 2
Reputation: 120399
Use cumsum
and sum
the result along the columns:
>>> df.groupby('x').cumsum().sum(axis=1)
0 3
6 8
3 19
1 10
7 16
10 25
9 35
4 53
8 12
2 22
5 38
11 51
dtype: int64
Upvotes: 3