Reputation: 165
I need to classify userId X movieId and I have two columns: userId
and movieId
.
userId movieId
60265 2123
60265 2291
60265 2329
60265 2355
60265 2389
60265 2396
60265 2402
60265 2403
60265 2421
19254 2389
19254 2396
19254 2402
19254 2403
19254 2421
19254 2123
19254 2291
19254 2329
Each userId
has more than one movieId
watched. I pretend use histogram to distribute all movie watched by each user.
userId/movieId 2123 2291 2329 2355 2389 2396 2402 2403 2421 2592 2596
60265 1 1 1 1 1 1 1 1 1 0 0
19254 1 1 1 0 1 1 1 1 1 0 0
How can I use function get_dummies()
to construct a similar table of userId X movieId?
Upvotes: 3
Views: 87
Reputation: 6099
You need to set index then use get_dummies, here is the full code
import pandas as pd
data = {"movie": [2123, 2126, 2123], "userId": [1, 1, 2]}
df = pd.DataFrame(data)
df.set_index('userId', inplace=True)
pd.concat([df,pd.get_dummies(df['movie'], prefix='movie')], axis=1).drop(['movie'], axis=1, inplace=True)
Upvotes: 2
Reputation: 153460
You use pd.get_dummies
like this:
(pd.get_dummies(df.set_index('userId'), columns=['movieId'], prefix='', prefix_sep='')
.sum(level=0)
.reset_index())
Output:
userId 2123 2291 2329 2355 2389 2396 2402 2403 2421
0 60265 1 1 1 1 1 1 1 1 1
1 19254 1 1 1 0 1 1 1 1 1
Upvotes: 3