Reputation: 3
I have created the following pandas dataframe:
+------+-------+------------+
|userID|movieID|timesWatched|
+------+-------+------------+
|u1 |mv1 |5 |
|u1 |mv2 |2 |
|u2 |mv1 |1 |
|u3 |mv4 |30 |
+------+-------+------------+
I also have a list with 6 movies like this:
movies =['mv0', 'mv1', 'mv2', 'mv3', 'mv4', 'mv5']
What I would like to do is to create for every user a list like this:
u1 : [0, 5, 2, 0, 0, 0]
u2 : [0, 1, 0, 0, 0, 0]
u2 : [0, 0, 0, 0, 30, 0]
Is there a nice pythonic / pandas way of this, avoiding confusing for loops?
Upvotes: 0
Views: 127
Reputation: 260580
You can use categorical data and pivot_table
and convert to_dict
with the "list" format.
The dropna=False
option of pivot_table
combined with categorical data ensures to have all categories, even if all are NaNs.
movies =['mv0', 'mv1', 'mv2', 'mv3', 'mv4', 'mv5']
(df.assign(movieID=pd.Categorical(df['movieID'], categories=movies))
.pivot_table(index='movieID',
columns='userID',
values='timesWatched',
dropna=False, fill_value=0)
.to_dict('list')
)
Upvotes: 1