Reputation: 5670
I have the pandas DataFrame as shown below. df1
and df2
are both subsets of df
. I want to now create two new subsets of df
, let's call them dftrn
and dftst
. dftrn
should contain all values of df
that are in df1
and df2
. dftst
should contain values of df
that are not in df1
and df2
. What's the quickest way to do this?
df = pd.DataFrame({
'product': ['prod1', 'prod1', 'prod1', 'prod2', 'prod2', 'prod2'],
'date': ['2017-01-01', '2017-02-01', '2017-03-01', '2017-02-01', '2017-03-01', '2017-04-01'],
'value': [5.1, 5.2, 5.4, 2.3, 2.2, 2.4]
})
df1 = pd.DataFrame({
'product': ['prod1', 'prod1'],
'date': ['2017-02-01', '2017-03-01'],
'value': [5.1, 5.4]
})
df2 = pd.DataFrame({
'product': ['prod2', 'prod2'],
'date': ['2017-02-01', '2017-04-01'],
'value': [2.3, 2.4]
})
What I want dftrn
and dftst
to look like (indices don't matter):
dftrn
date product value
0 2017-02-01 prod1 5.2
1 2017-03-01 prod1 5.4
2 2017-02-01 prod2 2.3
3 2017-04-01 prod2 2.4
dftst
date product value
0 2017-01-01 prod1 5.1
1 2017-03-01 prod2 2.2
Upvotes: 0
Views: 1602
Reputation: 323396
Using concat
with isin
s=pd.concat([df1,df2])
df[df.apply(tuple,1).isin(s.apply(tuple,1))]
Out[77]:
date product value
2 2017-03-01 prod1 5.4
3 2017-02-01 prod2 2.3
5 2017-04-01 prod2 2.4
df[~df.apply(tuple,1).isin(s.apply(tuple,1))]
Out[78]:
date product value
0 2017-01-01 prod1 5.1
1 2017-02-01 prod1 5.2
4 2017-03-01 prod2 2.2
Upvotes: 1