Sum column if other columns equal value

Question

The problem I'm having doesn't seem too complex but I'm struggling to find a good way to go about it. All I need to do is a create a new dataframe from the data below the sums up the 'Minutes' column for each ID in 'P1'-'P5'. The only method I've come up with would be to create 5 different df's (1 for each 'P' column), append them and then use groupby to get the minute totals, but that seems wildy inefficient.

Any ideas on a better way to approach this would be greatly appreciated.

Original:

  P1       P2     P3      P4      P5    Minutes
1627759 1628369 201143  202330  202681     7
1626179 1628369 202330  203382  203935     5
1627759 1628369 201143  202681  203382     5
1627759 201143  202681  202694  203935     5
1626179 1628369 201143  202330  203935     4
1626179 1627759 202681  202694  203382     4
1626179 1628369 202694  203382  203935     3
1626179 1627759 201143  202681  202694     2
1626179 1628464 202330  202694  203935     2
1627759 201143  202330  202681  202694     2
1628369 201143  202330  202681  203935     2
1626179 1627759 1628369 201143  202681     1
1626179 202330  202694  203382  203935     1
1626179 1627759 201143  202694  203935     1
1626179 1627824 1628400 1628464 202954     1
1626179 1628369 1628464 202330  203935     1
1626179 1627824 1628400 1628464 203935     1
1627759 202330  202681  202694  203382     0

Expected outcome:

    P        Minutes
 1627759       27
 1626179       26
 1628369       28
 201143        29
 1628464       5
 202330        24
 1627824       2
 202681        28
 202694        20
 1628400       2
 203382        18
 203935        25
 202954        1

DYZ · Accepted Answer

I am not sure if melting the dataframe is more efficient, but at least it does not involve explicit loops:

pd.melt(df, id_vars='Minutes', value_vars=df.columns[:-1])\
  .groupby('value')['Minutes'].sum()

This solution "stretches" the first five columns into a tall one and uses it for grouping. According to %%timeit, this solution is twice as fast as yours (for the given dataframe).

Sum column if other columns equal value

Answers (2)

Related Questions