s5s
s5s

Reputation: 12134

Set values in df1 which match values in df2

I have 2 dataframes. df1 holds my data and df2 has some corrections. I'd like to replace the speedup values in df1 with the value from df2 where the other columns in df2 specify where in df1 to replace the value.

df1 = pd.DataFrame({
    'subject': ['English', 'Maths', 'Physics', 'English', 'Arts', 'Physics', 'English', 'PE'],
    'grade': ['D', 'A', 'A', 'C', 'F', 'B', 'C', 'A'],
    'date': pd.bdate_range(end='2019-12-12', periods=8)
})

df1['speedup'] = 1.0

df2 = pd.DataFrame({
    'subject': ['Maths', 'Physics'],
    'date': ['2019-12-04', '2019-12-10'],
    'speedup': [1.1, 0.7]
})

The above will produce DataFrames which look like this:

Out[1]: 
   subject grade       date  speedup
0  English     D 2019-12-03      1.0
1    Maths     A 2019-12-04      1.0
2  Physics     A 2019-12-05      1.0
3  English     C 2019-12-06      1.0
4     Arts     F 2019-12-09      1.0
5  Physics     B 2019-12-10      1.0
6  English     C 2019-12-11      1.0
7       PE     A 2019-12-12      1.0
df2
Out[2]: 
   subject        date  speedup
0    Maths  2019-12-04      1.1
1  Physics  2019-12-10      0.7

To avoid confusion, I want to make df1 look like this after merging it with df2:

df1 = pd.DataFrame({
        'subject': ['English', 'Maths', 'Physics', 'English', 'Arts', 'Physics', 'English', 'PE'],
        'grade': ['D', 'A', 'A', 'C', 'F', 'B', 'C', 'A'],
        'date': pd.bdate_range(end='2019-12-12', periods=8),
        'speedup': [1, 1.1, 1, 1, 1, 0.7, 1, 1]
    })

I tried this which didn't work

df1[(df1['date'].isin(df2['date'])) & (df1['subject'].isin(df2['subject']))]['speedup'] = df2['speedup']

Merging doesn't work because of the datatime component in the merge key.

df1.merge(df2, left_on=['subject', 'date'], right_on=['subject', 'date'], suffixes=('', '_y'))

Upvotes: 0

Views: 84

Answers (2)

paradocslover
paradocslover

Reputation: 3294

Instead of merging the two dataframes, I think a more efficient approach would be to use the second dataframe df2 as a dictionary with two indices.

Code:

##Setting the index of df2 to(date,subject)

df2.set_index(['date','subject'],inplace = True)


##This step is to make sure that the indexed date of df2 matches the date of df1

df1['date'] =df1['date'].apply(lambda x: x.strftime('%Y-%m-%d'))

##Iterating over the rows in the df1 and finding the value for speedup if it exists in the second dataframe

for i,val in df1.iterrows():
    #Check for a value for the tuple (date,subject) in the map
    rep_speedup = df2.loc[df1.loc[i,['date','subject']],'speedup'].values

    #If the map contained a speedup, then replace the existing speedup
    if len(rep_speedup) >0:
        df1.loc[i,'speedup']  =rep_speedup

This would print out df1 as follows:

    subject grade   date    speedup
0   English D   2019-12-03  1.0    
1   Maths   A   2019-12-04  1.1    
2   Physics A   2019-12-05  1.0    
3   English C   2019-12-06  1.0    
4   Arts    F   2019-12-09  1.0    
5   Physics B   2019-12-10  0.7    
6   English C   2019-12-11  1.0    
7   PE      A   2019-12-12  1.0    

Upvotes: 0

Prince Francis
Prince Francis

Reputation: 3097

convert string date into datetime object then do merge

df2['date'] = pd.to_datetime(df2['date'], format='%Y-%m-%d')
df1.merge(df2, how='left', on=['subject', 'date']).fillna(method='ffill', axis=1)

which gives you the following result enter image description here

Upvotes: 1

Related Questions