Create column in a pandas DataFrame based on whether value exists in a different DataFrame column

Question

I would like to add a column to dfA based on whether or not the job title (and its matching State) exists in dfB.

dfA=

Title     State    Income
Cashier   WY       15000
Cashier   WY       20000
Cashier   WY       15000
Manager   WY       25000
Cashier   CO       15000

dfB=

Title     State    MostFreqIncome
Cashier   WY       15000

In English: if a Title/State pair in dfA match any Title/State pair in dfB, create a new column in dfA which gives the MostFreqIncome attached to that Title/State pair.

Desired dfA:

Title     State    Income    MostFreqIncome
Cashier   WY       15000     15000
Cashier   WY       20000     15000
Cashier   WY       15000     15000
Manager   WY       25000     NA
Cashier   CO       15000     NA

Here's what I have so far:

is_in = dfA.Title.isin(dfB.Title) & dfA.State.isin(dfB.State)

This gives me False/True, but if it's True I want dfA.MostFreqIncome = dfB.MostFreqIncome. If it's False I want dfA.MostFreqIncome = 'NA'

Alex Riley · Accepted Answer

You can merge the two DataFrames A and B to create the new DataFrame:

>>> dfA.merge(dfB, on=['Title', 'State'], how='left')
     Title State  Income  MostFreqIncome
0  Cashier    WY   15000         15000.0
1  Cashier    WY   20000         15000.0
2  Cashier    WY   15000         15000.0
3  Manager    WY   25000             NaN
4  Cashier    CO   15000             NaN

Specifying how='left' here means that we're just only dfA's Title/State keys in the merged DataFrame.

Create column in a pandas DataFrame based on whether value exists in a different DataFrame column

Answers (1)

Related Questions