PointXIV
PointXIV

Reputation: 1288

Get matches between two dataframes in python

EDIT: Realized I might have already asked a very similar question, testing to see if that solution works.

Is there a way to get the matches from two dataframes without using pandas.merge?

I've been using this: pd.merge(dfa, dfb, on=('A', 'B', 'C'), how='right'), but it doesn't work how I would expect it to.

I have two dataframes, one that is like this:

A     B     C
w     1     1
b     1     2
c     3     1
c     3     1

and the second looking like

A     B     C    D
w     1     1    1/1/14
b     1     2    1/3/14
c     3     1    1/7/14
c     3     1    1/7/14
d     4     7    1/7/14
z     5     3    1/8/14

The first dataframe has been run through a query using pandasql, and for some reason pandasql has trouble handling dates even if they're not being used in the query (I get Error binding parameter 1 - probably unsupported type. even when it's not being used)

What I want in the end is

A     B     C    D
w     1     1    1/1/14
b     1     2    1/3/14
c     3     1    1/7/14
c     3     1    1/7/14

but when I use pandas.merge, I end up with

A     B     C    D
w     1     1    1/1/14
b     1     2    1/3/14
c     3     1    1/7/14
c     3     1    1/7/14
c     3     1    1/7/14
c     3     1    1/8/14

no matter if I use inner, right or left for how.

I was hoping I would just be able to use pandasql and skip all this by some kind of date parsing option but I can't seem to find any.

So, is there another way to do this that isn't pandas.merge or looping through all the columns/rows of the dataframes?

Upvotes: 1

Views: 223

Answers (1)

CT Zhu
CT Zhu

Reputation: 54330

Will concat with inner option handle the case for you, if your dataframes are already matched?

In [46]:

print pd.concat((df1, df2), join='inner', axis=1)[[0,1,2,-1]]
   A  B  C       D
0  w  1  1  1/1/14
1  b  1  2  1/3/14
2  c  3  1  1/7/14
3  c  3  1  1/7/14

[[0,1,2,-1]] is to avoid the duplicating A, B and C columns

Anyway, this may be a better and more general solution:

In [90]:

print pd.merge(df1.drop_duplicates(subset=['A', 'B', 'C']),
               df2, on=['A','B','C'])
   A  B  C       D
0  w  1  1  1/1/14
1  b  1  2  1/3/14
2  c  3  1  1/7/14
3  c  3  1  1/7/14

Upvotes: 2

Related Questions