How do I create a pivot table from a dataframe that has a column contains lists?

Question

I have a dataframe and it looks like,

import pandas as pd

data = [
  {
    "userId": 1,
    "binary_vote": 0,
    "genres": [
      "Adventure",
      "Comedy"
    ]
  },
  {
    "userId": 1,
    "binary_vote": 1,
    "genres": [
      "Adventure",
      "Drama"
    ]
  },
  {
    "userId": 2,
    "binary_vote": 0,
    "genres": [
      "Comedy",
      "Drama"
    ]
  },
  {
    "userId": 2,
    "binary_vote": 1,
    "genres": [
      "Adventure",
      "Drama"
    ]
  },
]

df = pd.DataFrame(data)
print(df)

   userId  binary_vote               genres
0  1       0            [Adventure, Comedy]
1  1       1            [Adventure, Drama]
2  2       0            [Comedy, Drama]
3  2       1            [Adventure, Drama]

I want to create column from binary_vote. And here is the expected output,

   userId        binary_vote_0       binary_vote_1
0  1       [Adventure, Comedy]  [Adventure, Drama]
1  2       [Comedy, Drama]      [Adventure, Drama]

I tried something like this, but I get an error,

pd.pivot_table(df, columns=['binary_vote'], values='genres')

Here is error,

DataError: No numeric types to aggregate

Any idea? Thanks in advance.

Erfan · Accepted Answer

We have to create our own aggfunc, in this case it's a simple one.

The reason it failed is because it tried to take the mean as it's the default aggregation function. Obviously, this will fail on your list.

piv = (
    df.pivot_table(index='userId', columns='binary_vote', values='genres', aggfunc=lambda x: x)
      .add_prefix('binary_vote_')
      .reset_index()
      .rename_axis(None, axis=1)
)
print(piv)

   userId        binary_vote_0       binary_vote_1
0       1  [Adventure, Comedy]  [Adventure, Drama]
1       2      [Comedy, Drama]  [Adventure, Drama]

How do I create a pivot table from a dataframe that has a column contains lists?

Answers (2)

Related Questions