Reputation: 6899
I have a Pandas dataframe, for example:
df = pd.DataFrame([[10.0, 0.0, 9.0, 7.2],[5.4, 1.0, 0.0, 8.0], [1.7, 4.4, 3.2, 0.0]], columns=['value', 'A', 'B', 'C'])
value A B C
0 10.0 0.0 9.0 7.2
1 5.4 1.0 0.0 8.0
2 1.7 4.4 3.2 0.0
3 5.4 3.1 6.1 0.0
For each column, I'd like to group all of the non-zero rows together and all of the zeroed rows together and then report the summed values (the first column). The result should look something like this:
Column Group Summed Value
A zero 10.0
A non-zero 12.5
B zero 5.4
B non-zero 17.1
C zero 1.7
C non-zero 15.4
I know that I can do this in NumPy but I'd like to understand if there is a way to accomplish this directly in Pandas. However, I can't figure out how to define the splitting condition in groupby
Upvotes: 0
Views: 433
Reputation: 42905
You could
df = pd.DataFrame([[10.0, 0.0, 9.0, 7.2],[5.4, 1.0, 0.0, 8.0], [1.7, 4.4, 3.2, 0.0],[5.4, 3.1, 6.1, 0.0]], columns=['value', 'A', 'B', 'C'])
df = df.set_index('value').stack().reset_index()
df.columns = ['value', 'Column', 'split_value']
df = df.set_index('split_value')
def f(value):
if value==0:
return 'zero'
else:
return 'non-zero'
print(df.groupby(['Column', f])['value'].sum())
to get:
Column
A non-zero 12.5
zero 10.0
B non-zero 17.1
zero 5.4
C non-zero 15.4
zero 7.1
Name: value, dtype: float64
Upvotes: 2
Reputation: 269
It's not the most elegant solution, but it will do what you are looking to do.
import pandas as pd
df = pd.DataFrame({'value': [10.0, 5.4, 1.7, ], 'A': [0.0, 1.0, 4.4], 'B': [9.0, 0.0, 3.2], 'C': [7.2, 8.0, 0.0]})
column = []
group = []
vals = []
for c in ['A', 'B', 'C']:
vals.append(df[df[c] == 0.0]['value'].sum())
group.append('zero')
column.append(c)
vals.append(df[df[c] > 0.0]['value'].sum())
group.append('non-zero')
column.append(c)
ddf = pd.DataFrame({'Column': column, 'Group': group, 'Summed Value': vals})
Upvotes: 0