Reputation: 733
I would like to add some data (event_date
) from table B to table A, as described below. It looks like a join on event_id
, however this column contains duplicate values in both tables. There are more columns in both tables but I'm omitting them for clarity.
How to achieve the desired effect in Pandas and in SQL in the most direct way?
Table A:
id,event_id
1,123
2,123
3,456
4,456
5,456
Table B:
id,event_id,event_date
11,123,2017-02-06
12,456,2017-02-07
13,123,2017-02-06
14,456,2017-02-07
15,123,2017-02-06
16,123,2017-02-06
Desired outcome (table A + event_date):
id,event_id,event_date
1,123,2017-02-06
2,123,2017-02-06
3,456,2017-02-07
4,456,2017-02-07
5,456,2017-02-07
Upvotes: 1
Views: 531
Reputation: 210832
SQL part:
select distinct a.*, b.event_date
from table_a a
join table_b b
on a.event_id = b.event_id;
Upvotes: 1
Reputation: 76917
Using merge
, first drop duplicates from B
In [662]: A.merge(B[['event_id', 'event_date']].drop_duplicates())
Out[662]:
id event_id event_date
0 1 123 2017-02-06
1 2 123 2017-02-06
2 3 456 2017-02-07
3 4 456 2017-02-07
4 5 456 2017-02-07
Upvotes: 2
Reputation: 9721
You can use Pandas Merge to get the desired result. Finally get only the columns that you are interested from DataFrame
df_Final = pd.merge(df1,df2,on='event_id',how='left')
print df_Final[['id_y','event_id','event_date']]
output
id_y event_id event_date
0 1 123 2017-02-06
1 2 123 2017-02-06
2 3 456 2017-02-07
3 4 456 2017-02-07
4 5 456 2017-02-07
5 1 123 2017-02-06
6 2 123 2017-02-06
7 3 456 2017-02-07
8 4 456 2017-02-07
9 5 456 2017-02-07
10 1 123 2017-02-06
11 2 123 2017-02-06
12 1 123 2017-02-06
13 2 123 2017-02-06
Upvotes: 0