luw
luw

Reputation: 217

replace some entries in a column of dataframe by a column of another dataframe

I have a dataframe about user-product-rating as below, df1 =

       USER_ID  PRODUCT_ID  RATING
0        0           0       0
1        1           1       1
2        2           2       2
3        3           3       3
4        4           4       4
5        5           5       5
6        6           6       6
7        7           7       7
8        8           8       8
9        9           9       9

another dataframe is the true ratings of some users and some products as below, df2 =

       USER_ID  PRODUCT_ID  RATING
0        0           0       10
1        1           1       10
2        2           2       10
3        3           3       10

I want to use the true ratings from df2 to replace the corresponding ratings in df1. So what I want to obtain is

       USER_ID  PRODUCT_ID  RATING
0        0           0      10
1        1           1      10
2        2           2      10
3        3           3      10
4        4           4       4
5        5           5       5
6        6           6       6
7        7           7       7
8        8           8       8
9        9           9       9

Any operation to realize this?

Upvotes: 2

Views: 48

Answers (2)

oppressionslayer
oppressionslayer

Reputation: 7214

You can use combine first:

df2.astype(object).combine_first(df1)                                                                                                                                               

  USER_ID PRODUCT_ID RATING
0       0          0     10
1       1          1     10
2       2          2     10
3       3          3     10
4       4          4      4
5       5          5      5
6       6          6      6
7       7          7      7
8       8          8      8
9       9          9      9

Upvotes: 0

Stepan
Stepan

Reputation: 1054

rng = [i for i in range(0,10)]
df1 = pd.DataFrame({"USER_ID": rng, 
                    "PRODUCT_ID": rng, 
                    "RATING": rng})


rng_2 = [i for i in range(0,4)]
df2 = pd.DataFrame({'USER_ID' : rng_2,'PRODUCT_ID' : rng_2,
                   'RATING' : [10,10,10,10]})

Try to use update:

df1 = df1.set_index(['USER_ID', 'PRODUCT_ID'])
df2 = df2.set_index(['USER_ID', 'PRODUCT_ID'])
df1.update(df2)
df1.reset_index(inplace=True)
df2.reset_index(inplace=True)
print(df2)
       USER_ID  PRODUCT_ID  RATING
0        0           0    10.0
1        1           1    10.0
2        2           2    10.0
3        3           3    10.0
4        4           4     4.0
5        5           5     5.0
6        6           6     6.0
7        7           7     7.0
8        8           8     8.0
9        9           9     9.0

Upvotes: 3

Related Questions