Aditya Mertia
Aditya Mertia

Reputation: 515

merge two pandas data frame based on a column comparison and skip common columns of right

I am trying to merge two pandas data frames(DF-1 and DF-2) using a common column (datetime) (I imported both data frames from csv files). I want to add non-common columns from DF-2 into DF-1 ignoring all the common columns from DF-2.

DF-1

date       time  open   high   low    close      datetime         col1            
2018-01-01 09:15  11    14     17     20     2018-01-01 09:15:00  101
2018-01-01 09:16  12    15     18     21     2018-01-01 09:16:00  102
2018-01-01 09:17  13    16     19     22     2018-01-01 09:17:00  103

DF-2

date       time  open   high   low    close      datetime         col2            
2018-01-01 09:15 23     26     29     32     2018-01-01 09:15:00  104
2018-01-01 09:16 24     27     30     33     2018-01-01 09:16:00  105
2018-01-01 09:17 25     28     31     34     2018-01-01 09:17:00  106

merged DF(I want)

date       time  open   high   low    close   datetime          col1   col2        
2018-01-01 09:15  11    14     17     20   2018-01-01 09:15:00  101    104
2018-01-01 09:16  12    15     18     21   2018-01-01 09:16:00  102    105
2018-01-01 09:17  13    16     19     22   2018-01-01 09:17:00  103    106

Code used: merged_left = pd.merge(left=DF1,right=DF2, how='left', left_on='datetime', right_on='datetime')

What i get: Is two data framed merged with common columns named time_x, open_x, high_x, low_x, close_x, time_y, open_y, high_y, low_y, close_y, col1, col2

I want to ignore all _y columns and keep _x

Any help would be greatly appreciated.

Upvotes: 0

Views: 185

Answers (3)

Orenshi
Orenshi

Reputation: 1873

You can use suffixes to make sure the second dataframe has it's dupe columns named a certain way. Then you can filter out the columns with filter

>>> df1
   a  b
0  1  2
>>> df2
   a  b  c
0  1  2  3
>>> df1.merge(df2, on=['a'], suffixes=['', '_y'])
   a  b  b_y  c
0  1  2    2  3
>>> df1.merge(df2, on=['a'], how='left', suffixes=['', '_y']).filter(regex='^(?!_y).$', axis=1)
   a  b  c
0  1  2  3

-- Edit -- I find filtering dupe columns this way useful because you can have an arbitrary # of dupes and it'll take them out. You don't have to explicitly pass the columns names you want to keep

Upvotes: 3

chris dorn
chris dorn

Reputation: 845

You could create a list comprehension with all the '_y' columns, then pass that into pandas.drop

drop_labels = [col for col in merged_left.columns if col.find('_y') > 0]
merged_left.drop(drop_labels,axis = 1,inplace = True)

That will leave you with all unique columns and the _x columns

Upvotes: 0

BENY
BENY

Reputation: 323306

You can filter the column within the merge

pd.merge(left=DF1,right=DF2[['datetime','col2']], how='left', left_on='datetime', right_on='datetime')

Upvotes: 2

Related Questions