Praveen
Praveen

Reputation: 2177

Replace rows in a Pandas df with rows from another df of different size

I have 2 Pandas dfs, A and B , with some matching columns but different number of rows. I want to copy values of matching columns from B to A based on some conditions. I have tried this:

s1 = pd.Series([5, 1, 'a'])
s2 = pd.Series([6, 2, 'b'])
s3 = pd.Series([7, 3, 'd'])
s4 = pd.Series([8, 4, 'e'])
s5 = pd.Series([9, 5, 'f'])

df1 = pd.DataFrame([list(s1), list(s2),list(s3),list(s4),list(s5)],  columns =  ["A", "B", "C"])

s1 = pd.Series([5, 6, 'p'])
s2 = pd.Series([6, 7, 'q'])
s3 = pd.Series([7, 8, 'r'])
s4 = pd.Series([8, 9, 's'])
s5 = pd.Series([9, 10, 't'])

df2 = pd.DataFrame([list(s1), list(s2),list(s3),list(s4),list(s5)],  columns =  ["A", "B", "C"])

df1.loc[df1.A.isin(df2.A), ['B', 'C']] = df2[['B', 'C']]
print (df1)
   A   B  C
0  5   6  p
1  6   7  q
2  7   8  r
3  8   9  s
4  9  10  t

This works when the number of rows are same, but if B has fewer rows, the index is not aligned and i get NaN in the final df. For example, df2 has one fewer rows and the row indexes are not aligned

df2 = pd.DataFrame([list(s1), list(s2),list(s4),list(s5)],  columns =  ["A", "B", "C"])

df1.loc[df1.A.isin(df2.A), ['B', 'C']] = df2[['B', 'C']]

print (df1)
       A     B    C
    0  5   6.0    p
    1  6   7.0    q
    2  7   8.0    r
    3  8  10.0    t
    4  9   NaN  NaN

How to do this and copy the value if the values in columns A are same ?

Upvotes: 2

Views: 3117

Answers (2)

Praveen
Praveen

Reputation: 2177

The only way i found to solve this is as below:

df = df1.merge(df2, how='left', left_on='A', right_on = 'A')

def select(row, field):
    if row['{}_y'.format(field)] is np.nan:
        return row['{}_x'.format(field)]
    else:
        return row['{}_y'.format(field)]

df['C'] = df[['C_x', 'C_y']].apply(select, field='C', axis=1)
df['B'] = df[['B_x', 'B_y']].apply(select, field='B', axis=1)

Seems roundabout but cannot think of a better solution.

Upvotes: 0

ilia timofeev
ilia timofeev

Reputation: 1119

It seems like you are looking for pd.update Example from help,

df = pd.DataFrame({'A': [1, 2, 3],
                'B': [400, 500, 600]})
new_df = pd.DataFrame({'B': [4, np.nan, 6]})
df.update(new_df)

df

result

   A      B
0  1    4.0
1  2  500.0
2  3    6.0

Example from help, but note that index is important. You are using default index which is initial value order, but if you are updating values from dataframe with different length it may be important which rows you want to update.

Upvotes: 3

Related Questions