Reputation: 2177
I have 2 Pandas dfs, A and B , with some matching columns but different number of rows. I want to copy values of matching columns from B to A based on some conditions. I have tried this:
s1 = pd.Series([5, 1, 'a'])
s2 = pd.Series([6, 2, 'b'])
s3 = pd.Series([7, 3, 'd'])
s4 = pd.Series([8, 4, 'e'])
s5 = pd.Series([9, 5, 'f'])
df1 = pd.DataFrame([list(s1), list(s2),list(s3),list(s4),list(s5)], columns = ["A", "B", "C"])
s1 = pd.Series([5, 6, 'p'])
s2 = pd.Series([6, 7, 'q'])
s3 = pd.Series([7, 8, 'r'])
s4 = pd.Series([8, 9, 's'])
s5 = pd.Series([9, 10, 't'])
df2 = pd.DataFrame([list(s1), list(s2),list(s3),list(s4),list(s5)], columns = ["A", "B", "C"])
df1.loc[df1.A.isin(df2.A), ['B', 'C']] = df2[['B', 'C']]
print (df1)
A B C
0 5 6 p
1 6 7 q
2 7 8 r
3 8 9 s
4 9 10 t
This works when the number of rows are same, but if B has fewer rows, the index is not aligned and i get NaN in the final df. For example, df2 has one fewer rows and the row indexes are not aligned
df2 = pd.DataFrame([list(s1), list(s2),list(s4),list(s5)], columns = ["A", "B", "C"])
df1.loc[df1.A.isin(df2.A), ['B', 'C']] = df2[['B', 'C']]
print (df1)
A B C
0 5 6.0 p
1 6 7.0 q
2 7 8.0 r
3 8 10.0 t
4 9 NaN NaN
How to do this and copy the value if the values in columns A are same ?
Upvotes: 2
Views: 3117
Reputation: 2177
The only way i found to solve this is as below:
df = df1.merge(df2, how='left', left_on='A', right_on = 'A')
def select(row, field):
if row['{}_y'.format(field)] is np.nan:
return row['{}_x'.format(field)]
else:
return row['{}_y'.format(field)]
df['C'] = df[['C_x', 'C_y']].apply(select, field='C', axis=1)
df['B'] = df[['B_x', 'B_y']].apply(select, field='B', axis=1)
Seems roundabout but cannot think of a better solution.
Upvotes: 0
Reputation: 1119
It seems like you are looking for pd.update Example from help,
df = pd.DataFrame({'A': [1, 2, 3],
'B': [400, 500, 600]})
new_df = pd.DataFrame({'B': [4, np.nan, 6]})
df.update(new_df)
df
result
A B
0 1 4.0
1 2 500.0
2 3 6.0
Example from help, but note that index is important. You are using default index which is initial value order, but if you are updating values from dataframe with different length it may be important which rows you want to update.
Upvotes: 3