Jakub Sapko
Jakub Sapko

Reputation: 316

Pandas map between two dataframes into column

Let's say that I have a df1 like (there are more columns but only this one is relatable):

A
a1
a2
a3

and a df2 like:

A
a1
a3
a4
a7

The case is that df2 contains in column A (column names are the same both in df1 and df2) some of the values in df1, but not all of them. Now, I'd like to add a column "Found in df2?" to a df1, representing if the value was found or not. Example:

df1
A  Found in df2?
a1       Y
a2       N
a3       Y

I've tried np.where and some merging magic but couldn't wrap my head around this.

Upvotes: 1

Views: 954

Answers (3)

creanion
creanion

Reputation: 2743

Pandas merge has an indicator feature that tells you in which sides of the merge a value is found.

Like this:

dfm = pd.merge(df1, df2[['A']], how="left", indicator=True)

dfm['Found in df2'] = dfm["_merge"] == "both"

You get a column _merge that tells you "left_only" or "both" depending on where the A values are found. Use how="outer" if you want to know which values are only in df2 too.

Now, merge may reorder rows in df1 based on the order of A. See other questions on SO how to counteract that, if needed.

Upvotes: 1

sammfyu
sammfyu

Reputation: 11

If using pandas cannot solve your problem, I suggest you to turn them into list/arrays such that it is easier to work with.

Here is one way to work around:

import pandas as pd
df1 = pd.DataFrame({
    'A': [1,2,3]
})
df2 = pd.DataFrame({
    'A': [1,3,4,7]
})

df1['Found in df2?'] = ['Y' if x in df2['A'].values else 'N' for x in df1['A'].values]
display(df1)

Upvotes: 1

sophocles
sophocles

Reputation: 13821

You can use isin:

df['found in df2'] = df['A'].isin(df2['A'].values)

print(df)

    A   found in df2
0   a1  True
1   a2  False
2   a3  True

Setup

df = pd.DataFrame({'A':['a1','a2','a3']})
df2 = pd.DataFrame({'A':['a1','a3','a4','a7']})

Upvotes: 4

Related Questions