Hannah
Hannah

Reputation: 367

Count occurrence of elements in column of lists (with a twist)

I've got a column of lists called "author_background" which I would like to analyze. The actual column consists of 8.000 rows. My aim is to get an overview on how many different elements there are in total (in all lists of the column) and count in how many lists each element occurs in.

How my column looks like:

df.author_background

0    [Professor for Business Administration, Harvard Business School]
1    [Professor for Industrial Engineering, University of Oakland]
2    [Harvard Business School]
3    [CEO, SpaceX]

desired output 

0     Harvard Business School                   2
1     Professor for Business Administration     1
2     Professor for Industrial Engineering      1
3     CEO                                       1
4     University of Oakland                     1 
5     SpaceX                                    1

I would like to know how often "Professor of Business Administration", "Professor for Industrial Engineering", "Harvard Business School", etc. occurs in the column. There are way more titles I don't know about.

Basically, I would like to use pd.value_counts for the column. However, its not possible because its a list.

Is there another way to count the occurrence of each element?

If thats more helpful: I also got a list which contains all elements of the lists (not nested).

Upvotes: 0

Views: 544

Answers (2)

John Zwinck
John Zwinck

Reputation: 249153

Turn it all into a single series by list flattening:

pd.Series([bg for bgs in df.author_background for bg in bgs])

Now you can call value_counts() to get your result.

Upvotes: 1

Joe
Joe

Reputation: 12417

You can try so:

el = pd.Series([item for sublist in df.author_background for item in sublist])
df = el.groupby(el).size().rename_axis('author_background').reset_index(name='counter')

Upvotes: 0

Related Questions