Trying to group by but only specific rows based on their value

Question

I am finding this issue quite complex:

I have the following df:

values_1    values_2    values_3    id    name
 0.1          0.2          0.3       1   AAAA_living_thing
 0.1          0.2          0.3       1   AAA_mammals
 0.1          0.2          0.3       1   AA_dog
 0.2          0.4          0.6       2   AAAA_living_thing
 0.2          0.4          0.6       2   AAA_something
 0.2          0.4          0.6       2   AA_dog

The ouput should be:

values_1    values_2    values_3    id    name
 0.3          0.6          0.9       3   AAAA_living_thing
 0.1          0.2          0.3       1   AAA_mammals
 0.1          0.2          0.3       1   AA_dog
 0.2          0.4          0.6       2   AAA_something
 0.2          0.4          0.6       2   AA_dog

It would be like a group_by().sum() but only the AAAA_living_thing as the rows below are childs of AAAA_living_thing

Erfan · Accepted Answer

Seperate the dataframe first by using query and getting the rows only with AAAA_living_thing and without. Then use groupby and finally concat them back together:

temp = df.query('name.str.startswith("AAAA")').groupby('name', as_index=False).sum()
temp2 = df.query('~name.str.startswith("AAAA")')
final = pd.concat([temp, temp2])

Output

   id               name  values_1  values_2  values_3
0   3  AAAA_living_thing       0.3       0.6       0.9
1   1        AAA_mammals       0.1       0.2       0.3
2   1             AA_dog       0.1       0.2       0.3
4   2      AAA_something       0.2       0.4       0.6
5   2             AA_dog       0.2       0.4       0.6

Another way would be to make a unique identifier for rows which are not AAAA_living_thing with np.where and then groupby on name + unique identifier:

s = np.where(df['name'].str.startswith('AAAA'), 0, df.index)
final = df.groupby(['name', s], as_index=False).sum()

Output

                name  values_1  values_2  values_3  id
0  AAAA_living_thing       0.3       0.6       0.9   3
1        AAA_mammals       0.1       0.2       0.3   1
2      AAA_something       0.2       0.4       0.6   2
3             AA_dog       0.1       0.2       0.3   1
4             AA_dog       0.2       0.4       0.6   2

Trying to group by but only specific rows based on their value

Answers (1)

Related Questions