Reputation: 52
I am trying to train a ML model to predict book genre based on movie titles, but since each movie has mixed Genre the accuracy of my model is very less because it is not able to match the Genres properly. I want to keep only the first genre that appears in 'Genre' column. How can I achieve it? I tried
df['Genre'].split(',')[0]
But it does not seem to work.
Upvotes: 0
Views: 76
Reputation: 153
df['Top_Genre'] = df['Genre'].str.split(pat = ",", expand=True)[0]
Upvotes: 2
Reputation: 2311
You can write a function to do this provided Genre column has a string of Genres
def get_first_genre(x):
return x.Genre.split(',')[0]
df["firstGenre"] = df.apply(get_first_genre, axis=1)
Upvotes: 0
Reputation: 204
df['Genre'] = [ data.split(',')[0] for data in df['Genre']]
I hope this can help you
Upvotes: 1