Reputation: 1288
EDIT: Realized I might have already asked a very similar question, testing to see if that solution works.
Is there a way to get the matches from two dataframes without using pandas.merge
?
I've been using this: pd.merge(dfa, dfb, on=('A', 'B', 'C'), how='right')
, but it doesn't work how I would expect it to.
I have two dataframes, one that is like this:
A B C
w 1 1
b 1 2
c 3 1
c 3 1
and the second looking like
A B C D
w 1 1 1/1/14
b 1 2 1/3/14
c 3 1 1/7/14
c 3 1 1/7/14
d 4 7 1/7/14
z 5 3 1/8/14
The first dataframe has been run through a query using pandasql
, and for some reason pandasql
has trouble handling dates even if they're not being used in the query (I get Error binding parameter 1 - probably unsupported type.
even when it's not being used)
What I want in the end is
A B C D
w 1 1 1/1/14
b 1 2 1/3/14
c 3 1 1/7/14
c 3 1 1/7/14
but when I use pandas.merge
, I end up with
A B C D
w 1 1 1/1/14
b 1 2 1/3/14
c 3 1 1/7/14
c 3 1 1/7/14
c 3 1 1/7/14
c 3 1 1/8/14
no matter if I use inner, right or left for how
.
I was hoping I would just be able to use pandasql
and skip all this by some kind of date parsing option but I can't seem to find any.
So, is there another way to do this that isn't pandas.merge
or looping through all the columns/rows of the dataframes?
Upvotes: 1
Views: 223
Reputation: 54330
Will concat
with inner
option handle the case for you, if your dataframes are already matched?
In [46]:
print pd.concat((df1, df2), join='inner', axis=1)[[0,1,2,-1]]
A B C D
0 w 1 1 1/1/14
1 b 1 2 1/3/14
2 c 3 1 1/7/14
3 c 3 1 1/7/14
[[0,1,2,-1]]
is to avoid the duplicating A
, B
and C
columns
Anyway, this may be a better and more general solution:
In [90]:
print pd.merge(df1.drop_duplicates(subset=['A', 'B', 'C']),
df2, on=['A','B','C'])
A B C D
0 w 1 1 1/1/14
1 b 1 2 1/3/14
2 c 3 1 1/7/14
3 c 3 1 1/7/14
Upvotes: 2