Reputation: 739
I have one data frame where I would like to create new column from the sum of different rows within one group df["NEW_Salary"]
.
If grouped df
by the column Year
, month
& day
, I want to for each group to sum the rows where Combination
is True
to the rows where Combination
is False
.
import pandas as pd
data = {"Year":[2002,2002,2002,2002,2002,2010,2010,2010,2010,2010],
"Name":['Jason','Tom','KimJason','KimTom','Kim','Jason','Tom','KimJason','KimTom','Kim'],
"Combination":[False,False,True,True,False,False,False,True,True,False],
"Salary":[10,20,25,25,30,20,30,35,35,40]
}
df=pd.dataframe(data)
Year Month Day Name Combination Salary
0 2002 1 15 Jason False 10
1 2002 1 15 Tom False 20
2 2002 1 15 KimJason True 25
3 2002 1 15 KimTom True 25
4 2002 1 15 Kim False 30
5 2010 3 20 Jason False 20
6 2010 3 20 Tom False 30
7 2010 3 20 KimJason True 35
8 2010 3 20 KimTom True 35
9 2010 3 20 Kim False 40
10 2002 4 5 Mary False 10
11 2002 4 5 MaryTom True 20
12 2002 4 5 Tom False 30
df["New_Salary"]
would be created as following:
Name
is KimJason
,Salary
would be added to the Salary
rows where Name
is Kim
& Jason
Name
is KimTom
, Salary
would be added again to the Salary
rows where Name
is Kim
& Tom
KimTom
& KimJason
would be the same in the new column NEW_Salary
as in Salary
The expected output:
Year Month Day Name Combination Salary NEW_Salary
0 2002 1 15 Jason False 10 35
1 2002 1 15 Tom False 20 45
2 2002 1 15 KimJason True 25 25
3 2002 1 15 KimTom True 25 25
4 2002 1 15 Kim False 30 80
5 2010 3 20 Jason False 20 55
6 2010 3 20 Tom False 30 65
7 2010 3 20 KimJason True 35 35
8 2010 3 20 KimTom True 35 35
9 2010 3 20 Kim False 40 110
10 2002 4 5 Mary False 10 30
11 2002 4 5 MaryTom True 20 20
12 2002 4 5 Tom False 30 50
Is there an easy way to achieve this output? no matter how many groups I have ?
Upvotes: 1
Views: 70
Reputation: 7913
Here`s how you can do it, as far I can tell, it should also work for any groups.
First extract all rows with combination names to a dictionary
lookup = dict(tuple(df.loc[df['Combination']==True].groupby('Name')[['Year', 'Salary']]))
for key,value in lookup.items():
print(f"{key=}:\n{value}")
which looks like this: (value to each key is a df)
key='KimJason':
Year Salary
2 2002 25
7 2010 35
key='KimTom':
Year Salary
3 2002 25
8 2010 35
Then filter for rows where Combination
is False
, apply row by row the value of Salary
and add all values which are found for that name and year in the lookup dictionary. At the end update the df with the new Salary
values.
s = (df.loc[df['Combination']==False]
.apply(lambda row:
row['Salary'] + sum(lookup[x].loc[lookup[x]['Year']==row['Year'], 'Salary'].squeeze()
for x in lookup
if row['Name'] in x)
,axis=1)
)
df['Salary'].update(s)
print(df)
Output df
:
Year Name Combination Salary
0 2002 Jason False 35
1 2002 Tom False 45
2 2002 KimJason True 25
3 2002 KimTom True 25
4 2002 Kim False 80
5 2010 Jason False 55
6 2010 Tom False 65
7 2010 KimJason True 35
8 2010 KimTom True 35
9 2010 Kim False 110
Upvotes: 1