Add new column based on subset of dataframe

Question

I have a df similar to this:

import numpy as np
import pandas as pd

df = pd.DataFrame({'frequency': [3,5,7,8],
              'name': ['a', 'b', 'c', 'd'],
              'parent': [np.nan, 'a', 'a', 'b']})

which looks like this:

   frequency name parent
0          3    a    NaN
1          5    b      a
2          7    c      a
3          8    d      b

It is basically a tree structure and what I want is to sum the frequency of the children in a new column. It should look like this:

   frequency name parent  sum_of_children
0          3    a    NaN               12
1          5    b      a                8
2          7    c      a                0
3          8    d      b                0

What is the best way to do it? My idea is to get a subset of the df for each name where the current name == parent and then sum the frequency of this subset. Is this a good approach and how is it implemented best?

Andrej Kesely · Accepted Answer

Try:

df["sum_of_children"] = [
    df.loc[df["parent"] == n, "frequency"].sum() for n in df["name"]
]
print(df)

Prints:

   frequency name parent  sum_of_children
0          3    a    NaN               12
1          5    b      a                8
2          7    c      a                0
3          8    d      b                0

EDIT:

To get sum of children we use list-comprehension. Iterating over column "name" we get all rows where column "parent" is equal of this name. Then we use Series.sum() to get the value (it will gracefully handle NaN values).

Add new column based on subset of dataframe

Answers (1)

Related Questions