Reputation: 13
I have a pandas dataframe containing customer events with various columns. Some events appear more than once. I wanted to put all those events in a list. I did this:
dup_evets=[df_in['EVENTS'].value_counts()>1]
This placed all events in a list and added True/False to each event based on the check whether it appears more than 1 time.
How do I remove the False ones from the list?
Upvotes: 1
Views: 832
Reputation: 9197
You can do this:
df_in[df_in['EVENTS'].duplicated()]['EVENTS'].tolist()
Explained:
# Returns Series of booleans, called a mask.
mask = df_in['EVENTS'].duplicated()
# Slice (filter) dataframe based on boolean series, only returning the True ones
df_in[mask]
# Get column you are interested in
df_in[mask]['EVENTS']
# Return list of the values in it
df_in[mask]['EVENTS'].tolist()
If you want to have other amounts and not only find duplicates, you can use this:
df_in[df_in.groupby(['EVENTS'])['EVENTS'].transform('count')>1]['EVENTS'].tolist()
Upvotes: 1