Filtering rows by a list and sum values

Question

I have a df

    ID       Count_     Sum       AA     BB        dist
0       3     0.0        50.0   300.0    0.0  100.000000
1       7     0.0        40.0   700.0    0.0  141.421356
2    4983     0.0         0.0   500.0  200.0    0.000000
3    4982     0.0         0.0   400.0  200.0    0.000000
4    4984     0.0        30.0   600.0  200.0  100.000000
5    4981     0.0         0.0   300.0  200.0    0.000000
6    4985     0.0        40.0   700.0  200.0  141.421356

And two lists

List1 = [3,4983,4984]
List2 = [7,4981,4985]

Both lists are generated by some other code. And let's say in this case List1 is generated by id=5 and List2 is generated by id=18. I'd like to create a new df like this

     id      sum_dist
0       5     200.0        
1       18    282.84

Where new column sum_dist is created by sum all values in column dist, filter by List1 or List2. For example - for id=5 we look at List1 and filter all rows in df that have this values in column ID and sum() values in dist column.

I have a problem writing a general solution, so I'd be able to work on larger df's. Any suggestions?

Paul Fornia · Accepted Answer

I'd first get that id column into your df. Something like:

df['id'] = None
df.loc[df['ID'].isin(List1),'id'] = 5
df.loc[df['ID'].isin(List2),'id'] = 18

Then you can do the last step with a .groupby

df.groupby(by = 'id')['dist'].agg('sum')

If you want to generalize further, I'd start by putting the lists into a dictionary with the ids. This will make the above code easier in a loop:

lists = {5: [3,4983,4984],
         18: [7,4981,4985]}

Hope that helps!

EDIT: Fixed a bug with the .loc function.

Filtering rows by a list and sum values

Answers (2)

Related Questions