Mariana
Mariana

Reputation: 165

How can I use get_dummies() in this case?

I need to classify userId X movieId and I have two columns: userId and movieId.

userId  movieId
60265   2123
60265   2291
60265   2329
60265   2355
60265   2389
60265   2396
60265   2402
60265   2403
60265   2421
19254   2389
19254   2396
19254   2402
19254   2403
19254   2421
19254   2123
19254   2291
19254   2329

Each userId has more than one movieId watched. I pretend use histogram to distribute all movie watched by each user.

userId/movieId  2123  2291  2329  2355  2389  2396  2402  2403  2421  2592  2596
   60265          1     1     1    1      1     1     1     1     1     0     0   
   19254          1     1     1    0      1     1     1     1     1     0     0

How can I use function get_dummies() to construct a similar table of userId X movieId?

Upvotes: 3

Views: 87

Answers (2)

BlueSheepToken
BlueSheepToken

Reputation: 6099

You need to set index then use get_dummies, here is the full code

import pandas as pd
data = {"movie": [2123, 2126, 2123], "userId": [1, 1, 2]}

df = pd.DataFrame(data)
df.set_index('userId', inplace=True)
pd.concat([df,pd.get_dummies(df['movie'], prefix='movie')], axis=1).drop(['movie'], axis=1, inplace=True)

Upvotes: 2

Scott Boston
Scott Boston

Reputation: 153460

You use pd.get_dummies like this:

(pd.get_dummies(df.set_index('userId'), columns=['movieId'], prefix='', prefix_sep='')
   .sum(level=0)
   .reset_index())

Output:

   userId  2123  2291  2329  2355  2389  2396  2402  2403  2421
0   60265     1     1     1     1     1     1     1     1     1
1   19254     1     1     1     0     1     1     1     1     1

Upvotes: 3

Related Questions