Zaibi
Zaibi

Reputation: 343

Find if a dataframe is a subset of an another dataframe, while ignoring index

Trying to find if a pandas df is a subset of a different pandas df or not

I can compare for two dataframes when their index matches, but in my cases the rows have different indexes

ex = pd.DataFrame({"col1": ["banana", "tomato", "apple"],
               "col2": ["cat", "dog", "kangoo"],
               "col3": ["tv", "phone", "ps4"]})
ex2 = pd.DataFrame({"col1": [ "tomato", "apple"],
               "col2": [ "dog", "kangoo"],
               "col3": [ "phone", "ps4"]})

ex2.isin(ex).all().all()
>>> False

I want the above results to match and come out as True, currently it only looks for same index, how can I override this.

Upvotes: 4

Views: 1029

Answers (1)

jezrael
jezrael

Reputation: 863791

Possible solution is use merge by all columns (no parameter on) and then use isin with subset:

print (ex2.merge(ex).isin(ex2))
   col1  col2  col3
0  True  True  True
1  True  True  True

print (ex2.merge(ex).isin(ex2).all().all())
True

Another idea is compare MultiIndexes:

i1 = ex2.set_index(ex2.columns.tolist()).index
i2 = ex.set_index(ex.columns.tolist()).index

print (i1.isin(i2).all())
True

Upvotes: 4

Related Questions