Reputation: 2643
I have a dataframe and it looks like,
import pandas as pd
data = [
{
"userId": 1,
"binary_vote": 0,
"genres": [
"Adventure",
"Comedy"
]
},
{
"userId": 1,
"binary_vote": 1,
"genres": [
"Adventure",
"Drama"
]
},
{
"userId": 2,
"binary_vote": 0,
"genres": [
"Comedy",
"Drama"
]
},
{
"userId": 2,
"binary_vote": 1,
"genres": [
"Adventure",
"Drama"
]
},
]
df = pd.DataFrame(data)
print(df)
userId binary_vote genres
0 1 0 [Adventure, Comedy]
1 1 1 [Adventure, Drama]
2 2 0 [Comedy, Drama]
3 2 1 [Adventure, Drama]
I want to create column from binary_vote
. And here is the expected output,
userId binary_vote_0 binary_vote_1
0 1 [Adventure, Comedy] [Adventure, Drama]
1 2 [Comedy, Drama] [Adventure, Drama]
I tried something like this, but I get an error,
pd.pivot_table(df, columns=['binary_vote'], values='genres')
Here is error,
DataError: No numeric types to aggregate
Any idea? Thanks in advance.
Upvotes: 1
Views: 53
Reputation: 75100
Another way using set_index()
and unstack()
:
m=(df.set_index(['userId','binary_vote']).unstack()
.add_prefix('binary_vote_').droplevel(level=0,axis=1))
m.reset_index().rename_axis(None,axis=1)
userId binary_vote_0 binary_vote_1
0 1 [Adventure, Comedy] [Adventure, Drama]
1 2 [Comedy, Drama] [Adventure, Drama]
Upvotes: 1
Reputation: 42916
We have to create our own aggfunc
, in this case it's a simple one.
The reason it failed is because it tried to take the mean
as it's the default aggregation function. Obviously, this will fail on your list.
piv = (
df.pivot_table(index='userId', columns='binary_vote', values='genres', aggfunc=lambda x: x)
.add_prefix('binary_vote_')
.reset_index()
.rename_axis(None, axis=1)
)
print(piv)
userId binary_vote_0 binary_vote_1
0 1 [Adventure, Comedy] [Adventure, Drama]
1 2 [Comedy, Drama] [Adventure, Drama]
Upvotes: 3