Reputation: 75
I have two data frames that I would like to combine based on certain conditions. This is the first data frame, each line represents one obversation (thus IDs occure multiple times):
df1
ID Count Publication
0 A 10 1990
1 B 15 1990
2 A 17 1990
3 B 19 1991
4 A 13 1991
This is the second data frame. Here, each ID is shown only once but over time (here 1990 to 1993).
df2
ID 1990 1991 1992 1993
0 A 1.1 1.2 1.3 1.4
1 B 2.3 2.4 2.4 2.6
2 C 3.4 3.5 3.6 3.7
3 D 4.5 4.6 4.7 4.8
My goal is to add a results column to df1, in which I multiply the value from the df1["Count"] column with the respective value (ID-Year pair) from df2, e.g. first line: "ID" A in "1990" is 1.1 multiplied with "Count" 10 = 11.
results
ID Count Publication Results
0 A 10 1990 11.0
1 B 15 1990 34.5
2 A 17 1990 18.7
3 B 19 1991 45.6
4 A 13 1991 15.6
So far I have tried multiple options using pandas .apply()
function but it did not work. I have also tried to .merge()
the columns from df2 to df1 based on IDs but I still fail to make the calculation afterwards (I was hoping this simplies the problem).
Question: Is there an easy an efficient way to go throug df1 row by row and "pick" the corresponding values from df2 for the calculation?
Upvotes: 3
Views: 177
Reputation: 323226
Using lookup
df2.set_index('ID').lookup(df1.ID,df1.Publication.astype(str))
Out[189]: array([1.1, 2.3, 1.1, 2.4, 1.2])
df1['Results']=df2.set_index('ID').lookup(df1.ID,df1.Publication.astype(str))*(df1.Count)
df1
Out[194]:
ID Count Publication Results
0 A 10 1990 11.0
1 B 15 1990 34.5
2 A 17 1990 18.7
3 B 19 1991 45.6
4 A 13 1991 15.6
Upvotes: 3
Reputation: 846
I don't really know how efficient it is, but you can do that:
df1 = df1.set_index(['ID', 'Publication'])
df2 = df2.set_index('ID').stack()
df2.index.rename(['ID', 'Publication'], inplace=True)
df1['df2_value'] = df2
df1['result'] = df1['Count'] * df1['df2_value']
Upvotes: 1