Define Splitting Condition in Pandas Groupby

Question

I have a Pandas dataframe, for example:

df = pd.DataFrame([[10.0, 0.0, 9.0, 7.2],[5.4, 1.0, 0.0, 8.0], [1.7, 4.4, 3.2, 0.0]], columns=['value', 'A', 'B', 'C'])

   value  A    B    C
0  10.0   0.0  9.0  7.2
1  5.4    1.0  0.0  8.0
2  1.7    4.4  3.2  0.0
3  5.4    3.1  6.1  0.0

For each column, I'd like to group all of the non-zero rows together and all of the zeroed rows together and then report the summed values (the first column). The result should look something like this:

Column  Group     Summed Value
A       zero      10.0
A       non-zero  12.5
B       zero      5.4
B       non-zero  17.1
C       zero      1.7
C       non-zero  15.4

I know that I can do this in NumPy but I'd like to understand if there is a way to accomplish this directly in Pandas. However, I can't figure out how to define the splitting condition in groupby

Stefan · Accepted Answer

You could

df = pd.DataFrame([[10.0, 0.0, 9.0, 7.2],[5.4, 1.0, 0.0, 8.0], [1.7, 4.4, 3.2, 0.0],[5.4, 3.1, 6.1, 0.0]], columns=['value', 'A', 'B', 'C'])
df = df.set_index('value').stack().reset_index()
df.columns = ['value', 'Column', 'split_value']
df = df.set_index('split_value')

def f(value):
    if value==0:
        return 'zero'
    else:
        return 'non-zero'
print(df.groupby(['Column', f])['value'].sum())

to get:

Column          
A       non-zero    12.5
        zero        10.0
B       non-zero    17.1
        zero         5.4
C       non-zero    15.4
        zero         7.1
Name: value, dtype: float64

Define Splitting Condition in Pandas Groupby

Answers (2)

Related Questions