Reputation: 1
ID | Genre |
---|---|
1 | Drama, Comedy |
2 | Action, Science Fiction, Suspense |
3 | Horror, Suspense |
4 | Comedy |
5 | Action, Drama, Sports |
6 | Comedy, Science Fiction |
7 | Drama |
I have a Pandas dataframe with around 10000 rows, these above are the first 7th rows. I would like to create another dataframe or a dictionary that will count the occurrence of every genre existing in the dataset.
Something like this:
Genre | Count |
---|---|
Drama | 3 |
Comedy | 3 |
Action | 2 |
Science Fiction | 2 |
Suspense | 2 |
Sports | 1 |
Ps. In the original df there are more then this 6 genres.
I have an idea how to do it iteratively with a for loop:
my_dict = {}
for elem in df['Genre']:
for genre in elem.split(','):
if genre in my_dict:
my_dict[genre] += 1
else:
my_dict[genre] = 1
However, is there any way to achieve this using NumPy or Pandas functions? As with 10000 rows it takes a lot of time.
Upvotes: 0
Views: 25
Reputation: 261860
You can use:
df['Genre'].str.get_dummies(', ').sum()
Output:
Action 2
Comedy 3
Drama 3
Horror 1
Science Fiction 2
Sports 1
Suspense 2
dtype: int64
Or, maybe less efficient:
df['Genre'].str.split(',\s*').explode().value_counts()
Output:
Drama 3
Comedy 3
Action 2
Science Fiction 2
Suspense 2
Horror 1
Sports 1
Name: Genre, dtype: int64
Upvotes: 1