sereizam
sereizam

Reputation: 2120

Group by value of sum of columns with Pandas

I got lost in Pandas doc and features trying to figure out a way to groupby a DataFrame by the values of the sum of the columns.

for instance, let say I have the following data :

In [2]: dat = {'a':[1,0,0], 'b':[0,1,0], 'c':[1,0,0], 'd':[2,3,4]}

In [3]: df = pd.DataFrame(dat)

In [4]: df
Out[4]: 
   a  b  c  d
0  1  0  1  2
1  0  1  0  3
2  0  0  0  4

I would like columns a, b and c to be grouped since they all have their sum equal to 1. The resulting DataFrame would have columns labels equals to the sum of the columns it summed. Like this :

   1  9
0  2  2
1  1  3
2  0  4

Any idea to put me in the good direction ? Thanks in advance !

Upvotes: 4

Views: 8063

Answers (2)

TomAugspurger
TomAugspurger

Reputation: 28926

Here you go:

In [57]: df.groupby(df.sum(), axis=1).sum()
Out[57]: 
   1  9
0  2  2
1  1  3
2  0  4

[3 rows x 2 columns]

df.sum() is your grouper. It sums over the 0 axis (the index), giving you the two groups: 1 (columns a, b, and, c) and 9 (column d) . You want to group the columns (axis=1), and take the sum of each group.

Upvotes: 10

LondonRob
LondonRob

Reputation: 78673

Because pandas is designed with database concepts in mind, it's really expected information to be stored together in rows, not in columns. Because of this, it's usually more elegant to do things row-wise. Here's how to solve your problem row-wise:

dat = {'a':[1,0,0], 'b':[0,1,0], 'c':[1,0,0], 'd':[2,3,4]}
df = pd.DataFrame(dat)

df = df.transpose()
df['totals'] = df.sum(1)

print df.groupby('totals').sum().transpose()
#totals  1  9
#0       2  2
#1       1  3
#2       0  4

Upvotes: 0

Related Questions