Most common elements in list-column

Question

I am using python / pandas.

I have a dataframe like this:

     date         id         my_column
0    31.07.20     128909     ['hey', 'hi']
1    31.07.20     128914     ['hi']
3    31.07.20     853124     ['hi', 'hello', 'hey']
4    30.07.20     123456     ['hey']
...

The dataframe over 1.000.000 rows long. I want the top 10 most common words in the my_column column.

Appreciate any help.

jezrael · Accepted Answer

Use Series.explode with Series.value_counts, by default are values sorted, so for top10 need first 10 index values:

out = df['my_column'].explode().value_counts().index[:10].tolist()

Or you can use pure python solution for flatten and count top10:

from collections import Counter
from  itertools import chain

c = Counter(chain.from_iterable(df['my_column']))
out = [a for a, b in c.most_common(10)]

Most common elements in list-column

Answers (1)

Related Questions