Reputation: 23
I want to count number of DataFrame rows which contains all of the values from the list.
Lets say I've got:
abc = pd.DataFrame(np.random.randint(1,100,(100, 10)))
and I've got following list (with unique values):
x = [10, 32, 8]
What is the simplest way, using pandas, to chck if row contains all given values? I could probably convert each row to list and than compare content of new list with the given list and increment my counter if comparison results with success, but I guess it is not the fastest way in this case...
Upvotes: 1
Views: 437
Reputation: 30920
it is necessary to mask duplicates, then we can use DataFrame.isin
and DataFrame.sum
to count the number of matches per row and check if it is greater than or equal to the length of the list with DataFrame.ge
, which will be sufficient since there are no duplicates.
finally we obtain the sum of the rows that meet this condition
df=pd.DataFrame(np.sort(abc, axis=1))
df.where(df.ne(df.shift(axis=1))).isin(x).sum(axis=1).ge(len(x)).sum()
If not duplicates in rows:
abc.isin(x).sum(axis=1).ge(len(x)).sum()
References to mask duplicates:
DataFrame.ne
, df1.ne(df2)
is similar to df1 != df2
Upvotes: 2
Reputation: 323226
Something like
abc.apply(lambda y : pd.Series(x).isin(y).all(), axis=1).sum()
4
Upvotes: 1
Reputation: 38415
You can use numpy setdiff of the array and row, if the len is zero, all elements are present.
abc.apply(lambda row: len(np.setdiff1d(x, row)) == 0, axis = 1)
Upvotes: 0
Reputation: 13387
You can do:
# in case x is not unique
# x=set(x)
df=abc.stack().reset_index(level=1, drop=True).reset_index().drop_duplicates().rename(columns={0: "num"})
cnt=df.loc[df["num"].isin(x)].groupby("index")["num"].count().eq(len(x)).sum()
print(cnt)
Upvotes: 0