aleksandra
aleksandra

Reputation: 1

Is there a proper way to convert dataframe column containing in each row comma separated strings to a dictionary?

ID Genre
1 Drama, Comedy
2 Action, Science Fiction, Suspense
3 Horror, Suspense
4 Comedy
5 Action, Drama, Sports
6 Comedy, Science Fiction
7 Drama

I have a Pandas dataframe with around 10000 rows, these above are the first 7th rows. I would like to create another dataframe or a dictionary that will count the occurrence of every genre existing in the dataset.

Something like this:

Genre Count
Drama 3
Comedy 3
Action 2
Science Fiction 2
Suspense 2
Sports 1

Ps. In the original df there are more then this 6 genres.

I have an idea how to do it iteratively with a for loop:

my_dict = {} 
for elem in df['Genre']:
    for genre in elem.split(','):
        if genre in my_dict:
            my_dict[genre] += 1
        else:
            my_dict[genre] = 1

However, is there any way to achieve this using NumPy or Pandas functions? As with 10000 rows it takes a lot of time.

Upvotes: 0

Views: 25

Answers (1)

mozway
mozway

Reputation: 261860

You can use:

df['Genre'].str.get_dummies(', ').sum()

Output:

Action             2
Comedy             3
Drama              3
Horror             1
Science Fiction    2
Sports             1
Suspense           2
dtype: int64

Or, maybe less efficient:

df['Genre'].str.split(',\s*').explode().value_counts()

Output:

Drama              3
Comedy             3
Action             2
Science Fiction    2
Suspense           2
Horror             1
Sports             1
Name: Genre, dtype: int64

Upvotes: 1

Related Questions