Subset pandas DataFrame based on two columns in another DataFrame

Question

I have the pandas DataFrame as shown below. df1 and df2 are both subsets of df. I want to now create two new subsets of df, let's call them dftrn and dftst. dftrn should contain all values of df that are in df1 and df2. dftst should contain values of df that are not in df1 and df2. What's the quickest way to do this?

df = pd.DataFrame({
    'product': ['prod1', 'prod1', 'prod1', 'prod2', 'prod2', 'prod2'],
    'date': ['2017-01-01', '2017-02-01', '2017-03-01', '2017-02-01', '2017-03-01', '2017-04-01'],
    'value': [5.1, 5.2, 5.4, 2.3, 2.2, 2.4]
})

df1 = pd.DataFrame({
    'product': ['prod1', 'prod1'],
    'date': ['2017-02-01', '2017-03-01'],
    'value': [5.1, 5.4]
})

df2 = pd.DataFrame({
    'product': ['prod2', 'prod2'],
    'date': ['2017-02-01', '2017-04-01'],
    'value': [2.3, 2.4]
})

What I want dftrn and dftst to look like (indices don't matter):

dftrn
    date    product value
0   2017-02-01  prod1   5.2
1   2017-03-01  prod1   5.4
2   2017-02-01  prod2   2.3
3   2017-04-01  prod2   2.4

dftst
date    product value
0   2017-01-01  prod1   5.1
1   2017-03-01  prod2   2.2

BENY · Accepted Answer

Using concat with isin

s=pd.concat([df1,df2])
df[df.apply(tuple,1).isin(s.apply(tuple,1))]
Out[77]: 
         date product  value
2  2017-03-01   prod1    5.4
3  2017-02-01   prod2    2.3
5  2017-04-01   prod2    2.4
df[~df.apply(tuple,1).isin(s.apply(tuple,1))]
Out[78]: 
         date product  value
0  2017-01-01   prod1    5.1
1  2017-02-01   prod1    5.2
4  2017-03-01   prod2    2.2

Subset pandas DataFrame based on two columns in another DataFrame

Answers (1)

Related Questions