Lev
Lev

Reputation: 663

How to sum columns and append the result as a row in the same dataframe?

I have a dataframe like the following:

|   int_col_1   ||   int_col2   ||   str_col3   ||   float_col4   ||   float_col5   |
|---------------||--------------||--------------||----------------||----------------|
|      123      ||      456     ||     potato   ||     0.1        ||       -0.2     |
|      456      ||      456     ||     potato   ||     0.1        ||        5.0     |
|      456      ||      456     ||     potato   ||     0.1        ||       -0.2     |
|      ...      ||      ...     ||     ...      ||     ...        ||       ...      |

I want to sum all float_col_4 and float_col5 rows that are above 0 per different int_col_1 and append the result as a new row.

So the new dataframe would look like this.

|   int_col_1   ||   int_col2   ||   str_col3   ||   float_col4   ||   float_col5   |
|---------------||--------------||--------------||----------------||----------------|
|      123      ||      789     ||     potato   ||     0.1        ||       -0.2     |
|      Total    ||     NULL     ||       NULL   ||     0.1        ||        0.0     |
|      456      ||      734     ||     potato   ||     0.1        ||        5.0     |
|      456      ||      423     ||     potato   ||     0.1        ||       -0.2     |
|      Total    ||      NULL    ||       NULL   ||     0.2        ||        5.0     |
|      ...      ||      ...     ||     ...      ||     ...        ||       ...      |

How can I do this? I need that to be appended to the same dataframe for simplicity so I guess I will also have to cast the first column from int to str, or drop the "Total" to the str_col3?

Thanks!

Upvotes: 3

Views: 208

Answers (1)

jezrael
jezrael

Reputation: 862851

Use custom function in GroupBy.apply for append new rows, for filter values above 0 is used DataFrame.clip for replace lower values like 0 to 0, create default index and replace mising values by DataFrame.fillna with dict:

#if necessary sorting by first column
df = df.sort_values('int_col_1')


def f(x):
    x.loc[-1] = x[['float_col4','float_col5']].clip(lower=0).sum()
    #you can pass also mask function
    #x.loc[-1] = x[['float_col4','float_col5']].mask(lambda x: x < 0, 0).sum()
    return x

df = df.groupby('int_col_1').apply(f).reset_index(drop=True).fillna({'int_col_1':'Total'})
print (df)
  int_col_1  int_col2 str_col3  float_col4  float_col5
0       123     456.0   potato         0.1        -0.2
1     Total       NaN      NaN         0.1         0.0
2       456     456.0   potato         0.1         5.0
3       456     456.0   potato         0.1        -0.2
4     Total       NaN      NaN         0.2         5.0

Upvotes: 2

Related Questions