Reputation: 2273
I am finding this issue quite complex:
I have the following df:
values_1 values_2 values_3 id name
0.1 0.2 0.3 1 AAAA_living_thing
0.1 0.2 0.3 1 AAA_mammals
0.1 0.2 0.3 1 AA_dog
0.2 0.4 0.6 2 AAAA_living_thing
0.2 0.4 0.6 2 AAA_something
0.2 0.4 0.6 2 AA_dog
The ouput should be:
values_1 values_2 values_3 id name
0.3 0.6 0.9 3 AAAA_living_thing
0.1 0.2 0.3 1 AAA_mammals
0.1 0.2 0.3 1 AA_dog
0.2 0.4 0.6 2 AAA_something
0.2 0.4 0.6 2 AA_dog
It would be like a group_by().sum()
but only the AAAA_living_thing
as the rows below are childs of AAAA_living_thing
Upvotes: 0
Views: 24
Reputation: 42916
Seperate the dataframe first by using query
and getting the rows only with AAAA_living_thing
and without. Then use groupby
and finally concat
them back together:
temp = df.query('name.str.startswith("AAAA")').groupby('name', as_index=False).sum()
temp2 = df.query('~name.str.startswith("AAAA")')
final = pd.concat([temp, temp2])
Output
id name values_1 values_2 values_3
0 3 AAAA_living_thing 0.3 0.6 0.9
1 1 AAA_mammals 0.1 0.2 0.3
2 1 AA_dog 0.1 0.2 0.3
4 2 AAA_something 0.2 0.4 0.6
5 2 AA_dog 0.2 0.4 0.6
Another way would be to make a unique identifier for rows which are not AAAA_living_thing
with np.where
and then groupby
on name + unique identifier
:
s = np.where(df['name'].str.startswith('AAAA'), 0, df.index)
final = df.groupby(['name', s], as_index=False).sum()
Output
name values_1 values_2 values_3 id
0 AAAA_living_thing 0.3 0.6 0.9 3
1 AAA_mammals 0.1 0.2 0.3 1
2 AAA_something 0.2 0.4 0.6 2
3 AA_dog 0.1 0.2 0.3 1
4 AA_dog 0.2 0.4 0.6 2
Upvotes: 2