Reputation: 499
I have an emails dataframe in which I have given this query:
williams = emails[emails["employee"] == "kean-s"]
This selects all the rows that have employee kean-s. Then I count the frequencies and print the top most. This is how it's done:
williams["X-Folder"].value_counts()[:10]
This gives output like this:
attachments 2026
california 682
heat wave 244
ferc 188
pr-crisis management 92
federal legislation 88
rto 78
india 75
california - working group 72
environmental issues 71
Now, I need to print all the rows from emails that has X_Folder column equal to attachments, california, heat way etc. How do I go about it? When I print values[0] it simply returns the frequency number and not the term corresponding to it (tried printing it because if I'm able to loop through it, Ill just put a condition inside dataframe)
Upvotes: 0
Views: 65
Reputation: 862641
Use Series.isin
with boolean indexing
for values of index:
df = williams[williams["X-Folder"].isin(williams["X-Folder"].value_counts()[:10].index)]
Or:
df = williams[williams["X-Folder"].isin(williams["X-Folder"].value_counts().index[:10])]
If need filter all rows in original DataFrame
(also rows with not matched kean-s
) then use:
df1 = emails[emails["X-Folder"].isin(williams["X-Folder"].value_counts().index[:10])]
Upvotes: 1