Reputation: 419
I have a df
ID Count_ Sum AA BB dist
0 3 0.0 50.0 300.0 0.0 100.000000
1 7 0.0 40.0 700.0 0.0 141.421356
2 4983 0.0 0.0 500.0 200.0 0.000000
3 4982 0.0 0.0 400.0 200.0 0.000000
4 4984 0.0 30.0 600.0 200.0 100.000000
5 4981 0.0 0.0 300.0 200.0 0.000000
6 4985 0.0 40.0 700.0 200.0 141.421356
And two lists
List1 = [3,4983,4984]
List2 = [7,4981,4985]
Both lists are generated by some other code. And let's say in this case List1
is generated by id=5
and List2
is generated by id=18
.
I'd like to create a new df like this
id sum_dist
0 5 200.0
1 18 282.84
Where new column sum_dist
is created by sum all values in column dist
, filter by List1
or List2
. For example - for id=5
we look at List1
and filter all rows in df
that have this values in column ID
and sum
() values in dist
column.
I have a problem writing a general solution, so I'd be able to work on larger df's. Any suggestions?
Upvotes: 0
Views: 2864
Reputation: 323326
Ummm check with
from collections import ChainMap
mapdict = dict(ChainMap(*map(dict.fromkeys, [[3,4983,4984],[7,4981,4985]],[5,18])))
df.dist.groupby(df.ID.map(mapdict)).sum()
Out[8]:
ID
5.0 200.000000
18.0 282.842712
Name: dist, dtype: float64
Upvotes: 0
Reputation: 452
I'd first get that id column into your df. Something like:
df['id'] = None
df.loc[df['ID'].isin(List1),'id'] = 5
df.loc[df['ID'].isin(List2),'id'] = 18
Then you can do the last step with a .groupby
df.groupby(by = 'id')['dist'].agg('sum')
If you want to generalize further, I'd start by putting the lists into a dictionary with the ids. This will make the above code easier in a loop:
lists = {5: [3,4983,4984],
18: [7,4981,4985]}
Hope that helps!
EDIT: Fixed a bug with the .loc function.
Upvotes: 1