Reputation: 2492
I have a two dataframes. I want to see if specific rows (in its entirety) exist in the other dataframe. Example rows from df_subset:
id category value date
1 A 10 01-01-15
3 C 10 03-01-15
The other df_full:
id category value date
1 A 10 01-01-15
2 B 10 02-01-15
3 C 10 03-01-15
4 D 16 04-01-15
Is there someway to check if the rows of one dataframe exists in another? Something like this (obviously this does not work): df_subset in df_full
, exist?
> True
Upvotes: 5
Views: 6198
Reputation: 294526
using numpy
(df_subset.values[:, None] == df_full.values).all(2).any(1).all()
True
explanation
# using [:, None] to extend into new dimension at
# take advantage of broadcasting
a1 = df_subset.values[:, None] == df_full.values
# ━> third dimension ━>
# ━━━━> axis=2 ━━━>
# 1st dim
---->[[[ True True True True] # │
[False False True False] # │ second dimension
[False False True False] # │ axis=1
[False False False False]] # ↓
# axis=0
---->[[False False True False] # │
[False False True False] # │ second dimension
[ True True True True] # │ axis=1
[False False False False]]] # ↓
# first row of subset with each row of full
[[[ True True True True] <-- This one is true for all
[False False True False]
[False False True False]
[False False False False]]
# second row of subset with each row of full
[[False False True False]
[False False True False]
[ True True True True] <-- This one is true for all
[False False False False]]]
a2 = a1.all(2)
# ┌─ first row of subset all equal
[[ True False False False]
[False False True False]]
# └─ second row of subset all equal
a3 = a2.any(1)
# ┌─ first row of subset matched at least one row of full
[ True True]
# └─ second row of subset matched at least one row of full
a3.all()
True
ALL ROWS OF df_subset
ARE IN df_full
Upvotes: 3
Reputation: 863611
I think you can use merge
with inner join (by default) with DataFrame.equals
for compare with df_subset
:
print (pd.merge(df_subset,df).equals(df_subset))
True
Upvotes: 4
Reputation: 210972
you can use merge(..., indicator=True) method:
In [14]: pd.merge(df1, df2, indicator=True, how='outer')
Out[14]:
id category value date _merge
0 1 A 10 01-01-15 both
1 3 C 10 03-01-15 both
2 2 B 10 02-01-15 right_only
3 4 D 16 04-01-15 right_only
Upvotes: 3