jack.sparrow
jack.sparrow

Reputation: 23

Count number of DataFrame rows which contains all of the values from the list using pandas

I want to count number of DataFrame rows which contains all of the values from the list.

Lets say I've got:

abc = pd.DataFrame(np.random.randint(1,100,(100, 10)))

and I've got following list (with unique values):

x = [10, 32, 8]

What is the simplest way, using pandas, to chck if row contains all given values? I could probably convert each row to list and than compare content of new list with the given list and increment my counter if comparison results with success, but I guess it is not the fastest way in this case...

Upvotes: 1

Views: 437

Answers (4)

ansev
ansev

Reputation: 30920

it is necessary to mask duplicates, then we can use DataFrame.isin and DataFrame.sum to count the number of matches per row and check if it is greater than or equal to the length of the list with DataFrame.ge, which will be sufficient since there are no duplicates. finally we obtain the sum of the rows that meet this condition

df=pd.DataFrame(np.sort(abc, axis=1))
df.where(df.ne(df.shift(axis=1))).isin(x).sum(axis=1).ge(len(x)).sum()

If not duplicates in rows:

abc.isin(x).sum(axis=1).ge(len(x)).sum()

References to mask duplicates:

Upvotes: 2

BENY
BENY

Reputation: 323226

Something like

abc.apply(lambda y : pd.Series(x).isin(y).all(), axis=1).sum()
4

Upvotes: 1

Vaishali
Vaishali

Reputation: 38415

You can use numpy setdiff of the array and row, if the len is zero, all elements are present.

abc.apply(lambda row: len(np.setdiff1d(x, row)) == 0, axis = 1)

Upvotes: 0

Georgina Skibinski
Georgina Skibinski

Reputation: 13387

You can do:

# in case x is not unique
# x=set(x)

df=abc.stack().reset_index(level=1, drop=True).reset_index().drop_duplicates().rename(columns={0: "num"})

cnt=df.loc[df["num"].isin(x)].groupby("index")["num"].count().eq(len(x)).sum()

print(cnt)

Upvotes: 0

Related Questions