Reputation: 71
Dataframe:
MovieID | movieCater | rating |
---|---|---|
1 | Action, Comedy, Adventure | 4 |
2 | Action, Crime | 3 |
3 | Crime | 2 |
What I want:
MovieID | movieCater | Action | Comedy | Adventure | Crime |
---|---|---|---|---|---|
1 | Action, Comedy, Adventure | 4 | 4 | 4 | 0 |
2 | Action, Crime | 3 | 0 | 0 | 3 |
3 | Crime | 0 | 0 | 0 | 2 |
Upvotes: 2
Views: 88
Reputation: 352
Suppose the input dataframe is
df = pd.DataFrame({
'MovieID': ['001','002','003'],
'movieCat': ['Action, Comedy, Adventure', 'Action, Crime', 'Crime'],
'rating': [4,3,2]
})
#output
MovieID movieCat rating
0 001 Action, Comedy, Adventure 4
1 002 Action, Crime 3
2 003 Crime 2
I've re-use similar codes of @sushanth to get the final table
df['temp'] = df['movieCat'].str.split(", ")
df = df.explode(column='temp').pivot_table(index=['MovieID', 'movieCat'], columns='temp', values='rating', fill_value=0)
df.columns.name=None
df.reset_index(inplace=True)
#output
MovieID movieCat Adventure Comedy Crime Action
0 001 Action, Comedy, Adventure 4 4 0 4
1 002 Action, Crime 0 0 3 3
2 003 Crime 0 0 2 0
After this you could export the table to Excel
df.to_excel('my_file.xlsx', index=False)
Upvotes: 0
Reputation: 75080
You can also get_dummies
for the column movieCater
and multiply the rating:
out = df.join(df['movieCater'].str.get_dummies(",").mul(df['rating'],axis=0))
print(out)
MovieID movieCater rating Action Adventure Comedy Crime
0 1 Action,Comedy,Adventure 4 4 4 4 0
1 2 Action,Crime 3 3 0 0 3
2 3 Crime 2 0 0 0 2
To match the expected output, drop the rating column before joining:
out = (df.drop("rating",1).join(
df['movieCater'].str.get_dummies(",").mul(df['rating'],axis=0))
If the series is very very big, you might also consider splitting the strings with the sep="," then use this solution: https://stackoverflow.com/a/51420716/9840637 to get the dummies. Finally multiply the rating column.
Upvotes: 4
Reputation: 1213
Here's an approach using .pivot_table()
method.
First of all, you have to convert movieCater
column to a list using .apply()
method.
>>> df
MovieID movieCater rating
0 1 Action, Comedy, Adventure 4
1 2 Action, Crime 3
2 3 Crime 2
>>> df.assign(movieCater_list = df['movieCater'].apply(lambda x: x.split(', ')))
MovieID movieCater rating movieCater_list
0 1 Action, Comedy, Adventure 4 [Action, Comedy, Adventure]
1 2 Action, Crime 3 [Action, Crime]
2 3 Crime 2 [Crime]
Next, use .explode()
method on the movieCater_list
column.
>>> df.assign(movieCater_list = df['movieCater'].apply(lambda x: x.split(', '))).explode('movieCater_list')
MovieID movieCater rating movieCater_list
0 1 Action, Comedy, Adventure 4 Action
0 1 Action, Comedy, Adventure 4 Comedy
0 1 Action, Comedy, Adventure 4 Adventure
1 2 Action, Crime 3 Action
1 2 Action, Crime 3 Crime
2 3 Crime 2 Crime
Finally use .pivot_table()
>>> df.assign(movieCater_list = df['movieCater'].apply(lambda x: x.split(', '))).explode('movieCater_list').pivot_table(values='rating', index=['MovieID', 'movieCater'], columns='movieCater_list', fill_value=0)
movieCater_list Action Adventure Comedy Crime
MovieID movieCater
1 Action, Comedy, Adventure 4 4 4 0
2 Action, Crime 3 0 0 3
3 Crime 0 0 0 2
Upvotes: 2
Reputation: 8302
Here is a solution you can try out, first split
based on delimiter then explode
finally pivot_table
print(
df.assign(movieCater=df['movieCater'].str.split(","))
.explode(column='movieCater')
.pivot_table(index='MovieID', columns='movieCater', values='rating', fill_value=0)
)
movieCater Action Adventure Comedy Crime
MovieID
1 4 4 4 0
2 3 0 0 3
3 0 0 0 2
Upvotes: 1