Leo
Leo

Reputation: 71

How can I match the value and assign to them a new column based on other column string

Dataframe:

MovieID movieCater rating
1 Action, Comedy, Adventure 4
2 Action, Crime 3
3 Crime 2

What I want:

MovieID movieCater Action Comedy Adventure Crime
1 Action, Comedy, Adventure 4 4 4 0
2 Action, Crime 3 0 0 3
3 Crime 0 0 0 2

Upvotes: 2

Views: 88

Answers (4)

MiH
MiH

Reputation: 352

Suppose the input dataframe is

df = pd.DataFrame({
    'MovieID': ['001','002','003'], 
    'movieCat': ['Action, Comedy, Adventure', 'Action, Crime', 'Crime'], 
    'rating': [4,3,2]
    })

#output
    MovieID movieCat                    rating
0   001     Action, Comedy, Adventure   4
1   002     Action, Crime               3
2   003     Crime                       2

I've re-use similar codes of @sushanth to get the final table

df['temp'] = df['movieCat'].str.split(", ")
df = df.explode(column='temp').pivot_table(index=['MovieID', 'movieCat'], columns='temp', values='rating', fill_value=0)
df.columns.name=None
df.reset_index(inplace=True)

#output
    MovieID movieCat                    Adventure   Comedy  Crime   Action
0   001     Action, Comedy, Adventure   4           4       0       4
1   002     Action, Crime               0           0       3       3
2   003     Crime                       0           0       2       0

After this you could export the table to Excel

df.to_excel('my_file.xlsx', index=False)

Upvotes: 0

anky
anky

Reputation: 75080

You can also get_dummies for the column movieCater and multiply the rating:

out = df.join(df['movieCater'].str.get_dummies(",").mul(df['rating'],axis=0))

print(out)

   MovieID               movieCater  rating  Action  Adventure  Comedy  Crime
0        1  Action,Comedy,Adventure       4       4          4       4      0
1        2             Action,Crime       3       3          0       0      3
2        3                    Crime       2       0          0       0      2

To match the expected output, drop the rating column before joining:

out = (df.drop("rating",1).join(
      df['movieCater'].str.get_dummies(",").mul(df['rating'],axis=0))

If the series is very very big, you might also consider splitting the strings with the sep="," then use this solution: https://stackoverflow.com/a/51420716/9840637 to get the dummies. Finally multiply the rating column.

Upvotes: 4

Gusti Adli
Gusti Adli

Reputation: 1213

Here's an approach using .pivot_table() method.

First of all, you have to convert movieCater column to a list using .apply() method.

>>> df
   MovieID                 movieCater  rating
0        1  Action, Comedy, Adventure       4
1        2              Action, Crime       3
2        3                      Crime       2
>>> df.assign(movieCater_list = df['movieCater'].apply(lambda x: x.split(', ')))
   MovieID                 movieCater  rating              movieCater_list
0        1  Action, Comedy, Adventure       4  [Action, Comedy, Adventure]
1        2              Action, Crime       3              [Action, Crime]
2        3                      Crime       2                      [Crime]

Next, use .explode() method on the movieCater_list column.

>>> df.assign(movieCater_list = df['movieCater'].apply(lambda x: x.split(', '))).explode('movieCater_list')
   MovieID                 movieCater  rating movieCater_list
0        1  Action, Comedy, Adventure       4          Action
0        1  Action, Comedy, Adventure       4          Comedy
0        1  Action, Comedy, Adventure       4       Adventure
1        2              Action, Crime       3          Action
1        2              Action, Crime       3           Crime
2        3                      Crime       2           Crime

Finally use .pivot_table()

>>> df.assign(movieCater_list = df['movieCater'].apply(lambda x: x.split(', '))).explode('movieCater_list').pivot_table(values='rating', index=['MovieID', 'movieCater'], columns='movieCater_list', fill_value=0)
movieCater_list                    Action  Adventure  Comedy  Crime
MovieID movieCater
1       Action, Comedy, Adventure       4          4       4      0
2       Action, Crime                   3          0       0      3
3       Crime                           0          0       0      2

Upvotes: 2

sushanth
sushanth

Reputation: 8302

Here is a solution you can try out, first split based on delimiter then explode finally pivot_table

print(
    df.assign(movieCater=df['movieCater'].str.split(","))
        .explode(column='movieCater')
        .pivot_table(index='MovieID', columns='movieCater', values='rating', fill_value=0)
)

movieCater  Action  Adventure  Comedy  Crime
MovieID                                     
1                4          4       4      0
2                3          0       0      3
3                0          0       0      2

Upvotes: 1

Related Questions