Robo Bot
Robo Bot

Reputation: 51

Create multiple columns from a data frame using python

I have a csv file, showing below:

enter image description here

I am trying to create column for each title and also trying to create columns for each type of genre_and_votes so that the output is something like below : enter image description here

My code is given below:

import pandas as pd
df = pd.read_csv("C:\\Users\\mysite\\Desktop\\practice\\book1.csv")
#print(df)
 print(df['Title'].values,df['genre_and_votes'].values)

Now for the code above, it creates a df but not be able to create coulmns for each genre and votes, I am not sure how to do this now, need help.

Upvotes: 1

Views: 117

Answers (3)

mozway
mozway

Reputation: 262484

Here is a solution using extractall, a regex with named capturing groups, and pivot:

(df.join(df['genre_and_votes'].str.extractall('(?P<genre>[^,]+) (?P<value>\d+)').droplevel('match'))
   .pivot(index='title', columns='genre', values='value')
)

output:

genre              Mystery  Romance Christian Fiction Young adult
title                                                            
A Time to Embrace      NaN       16               114         NaN
Inner Circle            45       32               NaN         161

Upvotes: 1

Corralien
Corralien

Reputation: 120559

Use str.split and str.rsplit before pivot your dataframe and merge new columns with your original dataframe:

Setup a MRE

df = pd.DataFrame({'title': ['Inner Circle', 'A Time to Embrace'],
                   'genre_and_votes': ['Young adult 161, Mystery 45, Romance 32',
                                       'Christian Fiction 114, Romance 16']})
print(df)


# Output
               title                          genre_and_votes
0       Inner Circle  Young adult 161, Mystery 45, Romance 32
1  A Time to Embrace        Christian Fiction 114, Romance 16

Code:

out = df['genre_and_votes'].str.split(',').explode() \
                           .str.rsplit(' ', 1, expand=True) \
                           .pivot(columns=0, values=1)

df = pd.concat([df.drop(columns='genre_and_votes'), out], axis=1)

Final output

>>> df
               title  Mystery  Romance Christian Fiction Young adult
0       Inner Circle       45       32               NaN         161
1  A Time to Embrace      NaN       16               114         NaN

Upvotes: 2

Related Questions