How to select rows in dataframe based on a condition

Question

I have an emails dataframe in which I have given this query:

williams = emails[emails["employee"] == "kean-s"]

This selects all the rows that have employee kean-s. Then I count the frequencies and print the top most. This is how it's done:

williams["X-Folder"].value_counts()[:10]

This gives output like this:

attachments                   2026
california                     682
heat wave                      244
ferc                           188
pr-crisis management            92
federal legislation             88
rto                             78
india                           75
california - working group      72
environmental issues            71

Now, I need to print all the rows from emails that has X_Folder column equal to attachments, california, heat way etc. How do I go about it? When I print values[0] it simply returns the frequency number and not the term corresponding to it (tried printing it because if I'm able to loop through it, Ill just put a condition inside dataframe)

jezrael · Accepted Answer

Use Series.isin with boolean indexing for values of index:

df = williams[williams["X-Folder"].isin(williams["X-Folder"].value_counts()[:10].index)]

Or:

df = williams[williams["X-Folder"].isin(williams["X-Folder"].value_counts().index[:10])]

If need filter all rows in original DataFrame (also rows with not matched kean-s) then use:

df1 = emails[emails["X-Folder"].isin(williams["X-Folder"].value_counts().index[:10])]

How to select rows in dataframe based on a condition

Answers (1)

Related Questions