Python - group by multiple columns

Question

I have a list of lists - representing a table with 4 columns and many rows (10000+).

Each sub-list contains 4 variables.

Here is a small part of my table:

['1810569', 'a', 5, '1241.52']
['1437437', 'a', 5, '1123.90']
['1437437', 'b', 5, '1232.43']
['1810569', 'b', 5, '1321.31']
['1810569', 'a', 5, '1993.52']

The first column represents house-hold ID, and the second represents member id in the household.

The fourth column represents weights that I want to sum - distinctly for each member.

For the example above I want the output to be:

['1810569', 'a', 5, '3235.04']
['1437437', 'a', 5, '1123.90']
['1437437', 'b', 5, '1232.43']
['1810569', 'b', 5, '1321.31']

In another words - to sum the weights in lines 1 and 5 since they are weights of the same user - while all the other users are distinct.

I saw something about group by in pandas - but I didn't understand how exactly to use it for my problem.

EdChum · Accepted Answer

Assuming the following is your list then the following would work:

In [192]:
l=[['1810569', 'a', 5, '1241.52'],
['1437437', 'a', 5, '1123.90'],
['1437437', 'b', 5, '1232.43'],
['1810569', 'b', 5, '1321.31'],
['1810569', 'a', 5, '1993.52']]
l

Out[192]:
[['1810569', 'a', 5, '1241.52'],
 ['1437437', 'a', 5, '1123.90'],
 ['1437437', 'b', 5, '1232.43'],
 ['1810569', 'b', 5, '1321.31'],
 ['1810569', 'a', 5, '1993.52']]

In [201]:
# construct the df and convert the last column to float    
df = pd.DataFrame(l, columns=['household ID', 'Member ID', 'some col', 'weights'])
df['weights'] = df['weights'].astype(float)
df

Out[201]:
  household ID Member ID  some col  weights
0      1810569         a         5  1241.52
1      1437437         a         5  1123.90
2      1437437         b         5  1232.43
3      1810569         b         5  1321.31
4      1810569         a         5  1993.52

So we can now groupby on the household and member id and call sum on the 'weights' column:

In [200]:    
df.groupby(['household ID', 'Member ID'])['weights'].sum().reset_index()

Out[200]:
  household ID Member ID  weights
0      1437437         a  1123.90
1      1437437         b  1232.43
2      1810569         a  3235.04
3      1810569         b  1321.31

Python - group by multiple columns

Answers (2)

Related Questions