Reputation: 515
I am trying to merge two pandas data frames(DF-1 and DF-2) using a common column (datetime) (I imported both data frames from csv files). I want to add non-common columns from DF-2 into DF-1 ignoring all the common columns from DF-2.
DF-1
date time open high low close datetime col1
2018-01-01 09:15 11 14 17 20 2018-01-01 09:15:00 101
2018-01-01 09:16 12 15 18 21 2018-01-01 09:16:00 102
2018-01-01 09:17 13 16 19 22 2018-01-01 09:17:00 103
DF-2
date time open high low close datetime col2
2018-01-01 09:15 23 26 29 32 2018-01-01 09:15:00 104
2018-01-01 09:16 24 27 30 33 2018-01-01 09:16:00 105
2018-01-01 09:17 25 28 31 34 2018-01-01 09:17:00 106
merged DF(I want)
date time open high low close datetime col1 col2
2018-01-01 09:15 11 14 17 20 2018-01-01 09:15:00 101 104
2018-01-01 09:16 12 15 18 21 2018-01-01 09:16:00 102 105
2018-01-01 09:17 13 16 19 22 2018-01-01 09:17:00 103 106
Code used:
merged_left = pd.merge(left=DF1,right=DF2, how='left', left_on='datetime', right_on='datetime')
What i get: Is two data framed merged with common columns named time_x, open_x, high_x, low_x, close_x, time_y, open_y, high_y, low_y, close_y, col1, col2
I want to ignore all _y columns and keep _x
Any help would be greatly appreciated.
Upvotes: 0
Views: 185
Reputation: 1873
You can use suffixes
to make sure the second dataframe has it's dupe columns named a certain way. Then you can filter out the columns with filter
>>> df1
a b
0 1 2
>>> df2
a b c
0 1 2 3
>>> df1.merge(df2, on=['a'], suffixes=['', '_y'])
a b b_y c
0 1 2 2 3
>>> df1.merge(df2, on=['a'], how='left', suffixes=['', '_y']).filter(regex='^(?!_y).$', axis=1)
a b c
0 1 2 3
-- Edit -- I find filtering dupe columns this way useful because you can have an arbitrary # of dupes and it'll take them out. You don't have to explicitly pass the columns names you want to keep
Upvotes: 3
Reputation: 845
You could create a list comprehension with all the '_y' columns, then pass that into pandas.drop
drop_labels = [col for col in merged_left.columns if col.find('_y') > 0]
merged_left.drop(drop_labels,axis = 1,inplace = True)
That will leave you with all unique columns and the _x columns
Upvotes: 0
Reputation: 323306
You can filter the column within the merge
pd.merge(left=DF1,right=DF2[['datetime','col2']], how='left', left_on='datetime', right_on='datetime')
Upvotes: 2