Reputation: 367
I've got a column of lists called "author_background" which I would like to analyze. The actual column consists of 8.000 rows. My aim is to get an overview on how many different elements there are in total (in all lists of the column) and count in how many lists each element occurs in.
How my column looks like:
df.author_background
0 [Professor for Business Administration, Harvard Business School]
1 [Professor for Industrial Engineering, University of Oakland]
2 [Harvard Business School]
3 [CEO, SpaceX]
desired output
0 Harvard Business School 2
1 Professor for Business Administration 1
2 Professor for Industrial Engineering 1
3 CEO 1
4 University of Oakland 1
5 SpaceX 1
I would like to know how often "Professor of Business Administration", "Professor for Industrial Engineering", "Harvard Business School", etc. occurs in the column. There are way more titles I don't know about.
Basically, I would like to use pd.value_counts for the column. However, its not possible because its a list.
Is there another way to count the occurrence of each element?
If thats more helpful: I also got a list which contains all elements of the lists (not nested).
Upvotes: 0
Views: 544
Reputation: 249153
Turn it all into a single series by list flattening:
pd.Series([bg for bgs in df.author_background for bg in bgs])
Now you can call value_counts()
to get your result.
Upvotes: 1
Reputation: 12417
You can try so:
el = pd.Series([item for sublist in df.author_background for item in sublist])
df = el.groupby(el).size().rename_axis('author_background').reset_index(name='counter')
Upvotes: 0