slaw
slaw

Reputation: 6899

Define Splitting Condition in Pandas Groupby

I have a Pandas dataframe, for example:

df = pd.DataFrame([[10.0, 0.0, 9.0, 7.2],[5.4, 1.0, 0.0, 8.0], [1.7, 4.4, 3.2, 0.0]], columns=['value', 'A', 'B', 'C'])

   value  A    B    C
0  10.0   0.0  9.0  7.2
1  5.4    1.0  0.0  8.0
2  1.7    4.4  3.2  0.0
3  5.4    3.1  6.1  0.0

For each column, I'd like to group all of the non-zero rows together and all of the zeroed rows together and then report the summed values (the first column). The result should look something like this:

Column  Group     Summed Value
A       zero      10.0
A       non-zero  12.5
B       zero      5.4
B       non-zero  17.1
C       zero      1.7
C       non-zero  15.4

I know that I can do this in NumPy but I'd like to understand if there is a way to accomplish this directly in Pandas. However, I can't figure out how to define the splitting condition in groupby

Upvotes: 0

Views: 433

Answers (2)

Stefan
Stefan

Reputation: 42905

You could

df = pd.DataFrame([[10.0, 0.0, 9.0, 7.2],[5.4, 1.0, 0.0, 8.0], [1.7, 4.4, 3.2, 0.0],[5.4, 3.1, 6.1, 0.0]], columns=['value', 'A', 'B', 'C'])
df = df.set_index('value').stack().reset_index()
df.columns = ['value', 'Column', 'split_value']
df = df.set_index('split_value')

def f(value):
    if value==0:
        return 'zero'
    else:
        return 'non-zero'
print(df.groupby(['Column', f])['value'].sum())

to get:

Column          
A       non-zero    12.5
        zero        10.0
B       non-zero    17.1
        zero         5.4
C       non-zero    15.4
        zero         7.1
Name: value, dtype: float64

Upvotes: 2

tornesi
tornesi

Reputation: 269

It's not the most elegant solution, but it will do what you are looking to do.

import pandas as pd

df = pd.DataFrame({'value': [10.0, 5.4, 1.7, ], 'A': [0.0, 1.0, 4.4], 'B': [9.0, 0.0, 3.2], 'C': [7.2, 8.0, 0.0]})

column = []
group = []
vals = []

for c in ['A', 'B', 'C']:
    vals.append(df[df[c] == 0.0]['value'].sum())
    group.append('zero')
    column.append(c)

    vals.append(df[df[c] > 0.0]['value'].sum())
    group.append('non-zero')
    column.append(c)


ddf = pd.DataFrame({'Column': column, 'Group': group, 'Summed Value': vals})

Upvotes: 0

Related Questions