Reputation: 12134
I have 2 dataframes. df1
holds my data and df2
has some corrections. I'd like to replace the speedup
values in df1
with the value from df2
where the other columns in df2
specify where in df1
to replace the value.
df1 = pd.DataFrame({
'subject': ['English', 'Maths', 'Physics', 'English', 'Arts', 'Physics', 'English', 'PE'],
'grade': ['D', 'A', 'A', 'C', 'F', 'B', 'C', 'A'],
'date': pd.bdate_range(end='2019-12-12', periods=8)
})
df1['speedup'] = 1.0
df2 = pd.DataFrame({
'subject': ['Maths', 'Physics'],
'date': ['2019-12-04', '2019-12-10'],
'speedup': [1.1, 0.7]
})
The above will produce DataFrames which look like this:
Out[1]:
subject grade date speedup
0 English D 2019-12-03 1.0
1 Maths A 2019-12-04 1.0
2 Physics A 2019-12-05 1.0
3 English C 2019-12-06 1.0
4 Arts F 2019-12-09 1.0
5 Physics B 2019-12-10 1.0
6 English C 2019-12-11 1.0
7 PE A 2019-12-12 1.0
df2
Out[2]:
subject date speedup
0 Maths 2019-12-04 1.1
1 Physics 2019-12-10 0.7
To avoid confusion, I want to make df1
look like this after merging it with df2
:
df1 = pd.DataFrame({
'subject': ['English', 'Maths', 'Physics', 'English', 'Arts', 'Physics', 'English', 'PE'],
'grade': ['D', 'A', 'A', 'C', 'F', 'B', 'C', 'A'],
'date': pd.bdate_range(end='2019-12-12', periods=8),
'speedup': [1, 1.1, 1, 1, 1, 0.7, 1, 1]
})
I tried this which didn't work
df1[(df1['date'].isin(df2['date'])) & (df1['subject'].isin(df2['subject']))]['speedup'] = df2['speedup']
Merging doesn't work because of the datatime component in the merge key.
df1.merge(df2, left_on=['subject', 'date'], right_on=['subject', 'date'], suffixes=('', '_y'))
Upvotes: 0
Views: 84
Reputation: 3294
Instead of merging the two dataframes, I think a more efficient approach would be to use the second dataframe df2
as a dictionary with two indices.
Code:
##Setting the index of df2 to(date,subject)
df2.set_index(['date','subject'],inplace = True)
##This step is to make sure that the indexed date of df2 matches the date of df1
df1['date'] =df1['date'].apply(lambda x: x.strftime('%Y-%m-%d'))
##Iterating over the rows in the df1 and finding the value for speedup if it exists in the second dataframe
for i,val in df1.iterrows():
#Check for a value for the tuple (date,subject) in the map
rep_speedup = df2.loc[df1.loc[i,['date','subject']],'speedup'].values
#If the map contained a speedup, then replace the existing speedup
if len(rep_speedup) >0:
df1.loc[i,'speedup'] =rep_speedup
This would print out df1
as follows:
subject grade date speedup
0 English D 2019-12-03 1.0
1 Maths A 2019-12-04 1.1
2 Physics A 2019-12-05 1.0
3 English C 2019-12-06 1.0
4 Arts F 2019-12-09 1.0
5 Physics B 2019-12-10 0.7
6 English C 2019-12-11 1.0
7 PE A 2019-12-12 1.0
Upvotes: 0
Reputation: 3097
convert string date into datetime
object then do merge
df2['date'] = pd.to_datetime(df2['date'], format='%Y-%m-%d')
df1.merge(df2, how='left', on=['subject', 'date']).fillna(method='ffill', axis=1)
which gives you the following result
Upvotes: 1