zptr
zptr

Reputation: 3

Python Pandas turn a dataframe into counts list or a vector

I have created the following pandas dataframe:

+------+-------+------------+
|userID|movieID|timesWatched|
+------+-------+------------+
|u1    |mv1    |5           |
|u1    |mv2    |2           |
|u2    |mv1    |1           |
|u3    |mv4    |30          |
+------+-------+------------+ 

I also have a list with 6 movies like this:

 movies =['mv0', 'mv1', 'mv2', 'mv3', 'mv4', 'mv5']

What I would like to do is to create for every user a list like this:

u1 : [0, 5, 2, 0, 0, 0]
u2 : [0, 1, 0, 0, 0, 0]
u2 : [0, 0, 0, 0, 30, 0]

Is there a nice pythonic / pandas way of this, avoiding confusing for loops?

Upvotes: 0

Views: 127

Answers (1)

mozway
mozway

Reputation: 260580

You can use categorical data and pivot_table and convert to_dict with the "list" format.

The dropna=False option of pivot_table combined with categorical data ensures to have all categories, even if all are NaNs.

movies =['mv0', 'mv1', 'mv2', 'mv3', 'mv4', 'mv5']

(df.assign(movieID=pd.Categorical(df['movieID'], categories=movies))
   .pivot_table(index='movieID',
                columns='userID',
                values='timesWatched',
                dropna=False, fill_value=0)
   .to_dict('list')
)

Upvotes: 1

Related Questions