Reputation: 51
I have a csv file, showing below:
I am trying to create column for each title and also trying to create columns for each type of genre_and_votes so that the output is something like below :
My code is given below:
import pandas as pd
df = pd.read_csv("C:\\Users\\mysite\\Desktop\\practice\\book1.csv")
#print(df)
print(df['Title'].values,df['genre_and_votes'].values)
Now for the code above, it creates a df but not be able to create coulmns for each genre and votes, I am not sure how to do this now, need help.
Upvotes: 1
Views: 117
Reputation: 262484
Here is a solution using extractall
, a regex with named capturing groups, and pivot
:
(df.join(df['genre_and_votes'].str.extractall('(?P<genre>[^,]+) (?P<value>\d+)').droplevel('match'))
.pivot(index='title', columns='genre', values='value')
)
output:
genre Mystery Romance Christian Fiction Young adult
title
A Time to Embrace NaN 16 114 NaN
Inner Circle 45 32 NaN 161
Upvotes: 1
Reputation: 120559
Use str.split
and str.rsplit
before pivot
your dataframe and merge new columns with your original dataframe:
Setup a MRE
df = pd.DataFrame({'title': ['Inner Circle', 'A Time to Embrace'],
'genre_and_votes': ['Young adult 161, Mystery 45, Romance 32',
'Christian Fiction 114, Romance 16']})
print(df)
# Output
title genre_and_votes
0 Inner Circle Young adult 161, Mystery 45, Romance 32
1 A Time to Embrace Christian Fiction 114, Romance 16
Code:
out = df['genre_and_votes'].str.split(',').explode() \
.str.rsplit(' ', 1, expand=True) \
.pivot(columns=0, values=1)
df = pd.concat([df.drop(columns='genre_and_votes'), out], axis=1)
Final output
>>> df
title Mystery Romance Christian Fiction Young adult
0 Inner Circle 45 32 NaN 161
1 A Time to Embrace NaN 16 114 NaN
Upvotes: 2
Reputation: 66
There is a "pivot" function https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
Upvotes: 0