pariskey
pariskey

Reputation: 53

Pivot data-frame with list as value in column?

I have a data-frame like this:

name A B i
x 3 [1,1,1] 1
y 3 4 1
z 5 [1,1,1] 1
x 5 3 2
y 5 7 2
z 7 3 2

What I want is this:

x_A x_B y_A y_B z_A z_B i
3 [1,1,1] 3 4 5 [1,1,1] 1
5 3 5 7 3 3 2

My code so far looks like this:

df = df.pivot_table(column= 'name', values =['A','B'])
df2 = df.unstack().to_frame().T
df2.columns = df2.columns.map('_'.join)

However, when I run this, it seems to skip the columns that have lists (ie. column B) and gives me:

x_A y_A z_A
3 3 5

Is there another way to solve this? Am I missing something? TIA.

Upvotes: 1

Views: 123

Answers (3)

Henry Ecker
Henry Ecker

Reputation: 35676

We can use pivot_table with aggfunc of first (to handle object types like list) and sort_index to group level 1 keys together. Then collapse the MultiIndex with Index.swaplevel and Index.map. Lastly, return i to the columns with DataFrame.reset_index:

out_df = (
    df.pivot_table(
        index='i',
        columns='name',
        aggfunc='first'
    ).sort_index(axis=1, level=1)
)
out_df.columns = out_df.columns.swaplevel().map('_'.join)
out_df = out_df.reset_index()

out_df:

i x_A x_B y_A y_B z_A z_B
0 1 3 [1, 1, 1] 3 4 5 [1, 1, 1]
1 2 5 3 5 7 7 3

Setup:

import pandas as pd

df = pd.DataFrame({
    'name': ['x', 'y', 'z', 'x', 'y', 'z'],
    'A': [3, 3, 5, 5, 5, 7],
    'B': [[1, 1, 1], '4', [1, 1, 1], '3', '7', '3'],
    'i': [1, 1, 1, 2, 2, 2]
})

The pyjanitor module has an abstraction for this operation called pivot_wider which simplifies this transformation to:

out_df = df.pivot_wider(index='i', names_from='name')
i x_A y_A z_A x_B y_B z_B
0 1 3 3 5 [1, 1, 1] 4 [1, 1, 1]
1 2 5 5 7 3 7 3

Complete Working Example:

# pip install pyjanitor
# conda install pyjanitor -c conda-forge
import janitor
import pandas as pd

df = pd.DataFrame({
    'name': ['x', 'y', 'z', 'x', 'y', 'z'],
    'A': [3, 3, 5, 5, 5, 7],
    'B': [[1, 1, 1], '4', [1, 1, 1], '3', '7', '3'],
    'i': [1, 1, 1, 2, 2, 2]
})

out_df = df.pivot_wider(index='i', names_from='name')
print(out_df)

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195528

Try:

df = df.set_index("name").stack().to_frame().T
df.columns = df.columns.map("_".join)
print(df)

Prints:

  x_A        x_B y_A y_B z_A        z_B
0   3  [1, 1, 1]   3   4   5  [1, 1, 1]

EDIT: With updated question:

df = df.set_index(["name", "i"]).unstack(level=0).swaplevel(axis=1)
df.columns = df.columns.map("_".join)
print(df.reset_index())

Prints:

   i  x_A  y_A  z_A        x_B y_B        z_B
0  1    3    3    5  [1, 1, 1]   4  [1, 1, 1]
1  2    5    5    7          3   7          3

Upvotes: 2

tozCSS
tozCSS

Reputation: 6124

see https://numpy.org/doc/stable/reference/generated/numpy.ravel.html

from io import StringIO
df = """name    A   B
x   3   [1,1,1]
y   3   4
z   5   [1,1,1]"""
df = pd.read_table(StringIO(df)).set_index('name')
s = pd.Series(df.values.ravel(),
              index=[i+'_'+c for i in df.index for c in df.columns])
s.to_frame().T
x_A x_B y_A y_B z_A z_B
0 3 [1,1,1] 3 4 5 [1,1,1]

Upvotes: 0

Related Questions