Reputation: 733

Pandas/SQL join

I would like to add some data (event_date) from table B to table A, as described below. It looks like a join on event_id, however this column contains duplicate values in both tables. There are more columns in both tables but I'm omitting them for clarity.

How to achieve the desired effect in Pandas and in SQL in the most direct way?

Table A:

id,event_id
1,123
2,123
3,456
4,456
5,456

Table B:

id,event_id,event_date
11,123,2017-02-06
12,456,2017-02-07
13,123,2017-02-06
14,456,2017-02-07
15,123,2017-02-06
16,123,2017-02-06

Desired outcome (table A + event_date):

id,event_id,event_date
1,123,2017-02-06
2,123,2017-02-06
3,456,2017-02-07
4,456,2017-02-07
5,456,2017-02-07

Upvotes: 1

Answers (3)

MaxU - stand with Ukraine

Reputation: 210832

SQL part:

select distinct a.*, b.event_date
from table_a a
join table_b b
  on a.event_id = b.event_id;

Upvotes: 1

Zero

Reputation: 76917

Using merge, first drop duplicates from B

In [662]: A.merge(B[['event_id', 'event_date']].drop_duplicates())
Out[662]:
   id  event_id  event_date
0   1       123  2017-02-06
1   2       123  2017-02-06
2   3       456  2017-02-07
3   4       456  2017-02-07
4   5       456  2017-02-07

Upvotes: 2

Shijo

Reputation: 9721

You can use Pandas Merge to get the desired result. Finally get only the columns that you are interested from DataFrame

df_Final = pd.merge(df1,df2,on='event_id',how='left')
print df_Final[['id_y','event_id','event_date']]

output

    id_y  event_id  event_date
0      1       123  2017-02-06
1      2       123  2017-02-06
2      3       456  2017-02-07
3      4       456  2017-02-07
4      5       456  2017-02-07
5      1       123  2017-02-06
6      2       123  2017-02-06
7      3       456  2017-02-07
8      4       456  2017-02-07
9      5       456  2017-02-07
10     1       123  2017-02-06
11     2       123  2017-02-06
12     1       123  2017-02-06
13     2       123  2017-02-06

Upvotes: 0

Pandas/SQL join

Answers (3)

Related Questions