Reputation: 343
Trying to find if a pandas df is a subset of a different pandas df or not
I can compare for two dataframes when their index matches, but in my cases the rows have different indexes
ex = pd.DataFrame({"col1": ["banana", "tomato", "apple"],
"col2": ["cat", "dog", "kangoo"],
"col3": ["tv", "phone", "ps4"]})
ex2 = pd.DataFrame({"col1": [ "tomato", "apple"],
"col2": [ "dog", "kangoo"],
"col3": [ "phone", "ps4"]})
ex2.isin(ex).all().all()
>>> False
I want the above results to match and come out as True, currently it only looks for same index, how can I override this.
Upvotes: 4
Views: 1029
Reputation: 863791
Possible solution is use merge
by all columns (no parameter on
) and then use isin
with subset:
print (ex2.merge(ex).isin(ex2))
col1 col2 col3
0 True True True
1 True True True
print (ex2.merge(ex).isin(ex2).all().all())
True
Another idea is compare MultiIndex
es:
i1 = ex2.set_index(ex2.columns.tolist()).index
i2 = ex.set_index(ex.columns.tolist()).index
print (i1.isin(i2).all())
True
Upvotes: 4