Gunay Abdullayeva
Gunay Abdullayeva

Reputation: 711

How to merge 3 dataset in pandas

I have 3 datasets: restaurants.csv, users.csv and rating.csv. Restaurants.csv contains the restaurant ids and some features. Users.csv contains user ids and some user features. Raiting.csv contains restaurant ids, user ids and correspondig ratings.

place_id feature1 feature2 1 .. ..
2 .. ..

user_id feature1 feature2 1 .. ..
2 .. ..

place_id user_id raiting 1 1 2
2 1 3

I want to merge these 3 files to get one dataset which contains restaurants, users features and corresponding ratings as a label. I would like to get the dataset as follows:

place_id user_id place_feature1 ... user_feature2 raiting 1 1 .. 2
2 2 .. 3

How can I do it?

Upvotes: 1

Views: 11528

Answers (1)

jezrael
jezrael

Reputation: 863301

I think you need double merge with left join:

Sample DataFrames:

df1 = pd.DataFrame({'place_id':[1,2,3,4],
                   'B':[4,5,4,7],
                   'C':[7,8,9,4]})

print (df1)
   B  C  place_id
0  4  7         1
1  5  8         2
2  4  9         3
3  7  4         4

df2 = pd.DataFrame({'user_id':[1,2,3,4],
                   'D':[40,50,40,70],
                   'E':[70,80,90,40]})

print (df2)
    D   E  user_id
0  40  70        1
1  50  80        2
2  40  90        3
3  70  40        4

df3 = pd.DataFrame({'user_id':[1,2,3,4,1,2],
                   'place_id':[1,1,1,1,2,2],
                   'rating':[7,8,9,4,4,5]})

print (df3)
   place_id  rating  user_id
0         1       7        1
1         1       8        2
2         1       9        3
3         1       4        4
4         2       4        1
5         2       5        2

For real data use read_csv:

#df1 = pd.read_csv('restaurants.csv')
#df2 = pd.read_csv('users.csv')
#df3 = pd.read_csv('rating.csv')

df = df3.merge(df1, on='place_id', how='left').merge(df2, on='user_id', how='left')
print (df)
   place_id  rating  user_id  B  C   D   E
0         1       7        1  4  7  40  70
1         1       8        2  4  7  50  80
2         1       9        3  4  7  40  90
3         1       4        4  4  7  70  40
4         2       4        1  5  8  40  70
5         2       5        2  5  8  50  80

Upvotes: 4

Related Questions