Reputation: 113
I want to update the values of one dataframe if conditions from a second frame applies.
Right now I'm looping through each row of the second frame to find it's location in the first frame and update that row when it's found but this is very resource inefficient.
Is there a better way of merging this?
For example, I want to update resolve to "yes" in T1 for values found in T2
T1
user_id version issue Resolved
1 12345 crash no
2 12346 printing no
3 12347 screen no
3 12348 screen no
T2
user_id version issue
1 12345 crash
2 12346 printing
Thanks!!
expected output:
T1
user_id version issue Resolved
1 12345 crash yes
2 12346 printing yes
3 12347 screen no
3 12348 screen no
Upvotes: 3
Views: 74
Reputation: 7214
You can add a yes resolved to your new data, then combine the two
T2['Resolved'] = 'yes'
dtypes = T1.dtypes.combine_first(T2.dtypes)
T3 = T2.combine_first(T1)
for k, v in dtypes.iteritems():
T3[k] = T3[k].astype(v)
output:
user_id version issue Resolved
0 1 12345 crash yes
1 2 12346 printing yes
2 3 12347 screen no
3 3 12348 screen no
I found a way to combine without converting back to floats as well:
T3 = T2.astype('object').combine_first(T1)
so you shouldn't need the for loops anymore
Check this solution where the combine_first didn't work. I create a map to the yes and apply them to the first frame:
T1 = pd.DataFrame({'A': [1,1,2,3], 'B': ['no', 'no','no','no'], 'C': ['no1', 'no2','no3','no4']})
T2 = pd.DataFrame({'A': [2, 3], 'B': ['yes','yes']})
mapitems = T2.set_index('A').to_dict()['B']
T1['B'] = T1.apply(lambda x: mapitems[x.A] if x.A in mapitems.keys() else x.B, axis=1)
Upvotes: 2