Reputation: 145
I have a df similar to this:
import numpy as np
import pandas as pd
df = pd.DataFrame({'frequency': [3,5,7,8],
'name': ['a', 'b', 'c', 'd'],
'parent': [np.nan, 'a', 'a', 'b']})
which looks like this:
frequency name parent
0 3 a NaN
1 5 b a
2 7 c a
3 8 d b
It is basically a tree structure and what I want is to sum the frequency of the children in a new column. It should look like this:
frequency name parent sum_of_children
0 3 a NaN 12
1 5 b a 8
2 7 c a 0
3 8 d b 0
What is the best way to do it? My idea is to get a subset of the df for each name where the current name == parent and then sum the frequency of this subset. Is this a good approach and how is it implemented best?
Upvotes: 0
Views: 1912
Reputation: 195438
Try:
df["sum_of_children"] = [
df.loc[df["parent"] == n, "frequency"].sum() for n in df["name"]
]
print(df)
Prints:
frequency name parent sum_of_children
0 3 a NaN 12
1 5 b a 8
2 7 c a 0
3 8 d b 0
EDIT:
To get sum of children we use list-comprehension. Iterating over column "name"
we get all rows where column "parent"
is equal of this name. Then we use Series.sum()
to get the value (it will gracefully handle NaN
values).
Upvotes: 1