DiSt8885
DiSt8885

Reputation: 113

efficient way to update dataframe column based on condition

I want to update the values of one dataframe if conditions from a second frame applies.

Right now I'm looping through each row of the second frame to find it's location in the first frame and update that row when it's found but this is very resource inefficient.

Is there a better way of merging this?

For example, I want to update resolve to "yes" in T1 for values found in T2

T1
user_id   version        issue         Resolved
1         12345           crash           no   
2         12346           printing        no 
3         12347           screen          no
3         12348           screen          no


T2
user_id   version        issue 
1         12345           crash              
2         12346           printing              


Thanks!!

expected output:

T1
user_id   version        issue         Resolved
1         12345           crash           yes   
2         12346           printing        yes 
3         12347           screen          no
3         12348           screen          no

Upvotes: 3

Views: 74

Answers (1)

oppressionslayer
oppressionslayer

Reputation: 7214

You can add a yes resolved to your new data, then combine the two

T2['Resolved'] = 'yes'
dtypes = T1.dtypes.combine_first(T2.dtypes)
T3 = T2.combine_first(T1)
for k, v in dtypes.iteritems(): 
    T3[k] = T3[k].astype(v) 

output:

   user_id  version     issue Resolved
0        1    12345     crash      yes
1        2    12346  printing      yes
2        3    12347    screen       no
3        3    12348    screen       no

I found a way to combine without converting back to floats as well:

T3 = T2.astype('object').combine_first(T1)

so you shouldn't need the for loops anymore

Check this solution where the combine_first didn't work. I create a map to the yes and apply them to the first frame:

T1 = pd.DataFrame({'A': [1,1,2,3], 'B': ['no', 'no','no','no'], 'C': ['no1', 'no2','no3','no4']})
T2 = pd.DataFrame({'A': [2, 3], 'B': ['yes','yes']})
mapitems = T2.set_index('A').to_dict()['B']   
T1['B'] = T1.apply(lambda x: mapitems[x.A] if x.A in mapitems.keys() else x.B, axis=1)

Upvotes: 2

Related Questions