Reputation: 53

Pivot data-frame with list as value in column?

I have a data-frame like this:

name	A	B	i
x	3	[1,1,1]	1
y	3	4	1
z	5	[1,1,1]	1
x	5	3	2
y	5	7	2
z	7	3	2

What I want is this:

x_A	x_B	y_A	y_B	z_A	z_B	i
3	[1,1,1]	3	4	5	[1,1,1]	1
5	3	5	7	3	3	2

My code so far looks like this:

df = df.pivot_table(column= 'name', values =['A','B'])
df2 = df.unstack().to_frame().T
df2.columns = df2.columns.map('_'.join)

However, when I run this, it seems to skip the columns that have lists (ie. column B) and gives me:

x_A	y_A	z_A
3	3	5

Is there another way to solve this? Am I missing something? TIA.

Upvotes: 1

Answers (3)

Henry Ecker

Reputation: 35676

We can use pivot_table with aggfunc of first (to handle object types like list) and sort_index to group level 1 keys together. Then collapse the MultiIndex with Index.swaplevel and Index.map. Lastly, return i to the columns with DataFrame.reset_index:

out_df = (
    df.pivot_table(
        index='i',
        columns='name',
        aggfunc='first'
    ).sort_index(axis=1, level=1)
)
out_df.columns = out_df.columns.swaplevel().map('_'.join)
out_df = out_df.reset_index()

out_df:

	i	x_A	x_B	y_A	y_B	z_A	z_B
0	1	3	[1, 1, 1]	3	4	5	[1, 1, 1]
1	2	5	3	5	7	7	3

Setup:

import pandas as pd

df = pd.DataFrame({
    'name': ['x', 'y', 'z', 'x', 'y', 'z'],
    'A': [3, 3, 5, 5, 5, 7],
    'B': [[1, 1, 1], '4', [1, 1, 1], '3', '7', '3'],
    'i': [1, 1, 1, 2, 2, 2]
})

The pyjanitor module has an abstraction for this operation called pivot_wider which simplifies this transformation to:

out_df = df.pivot_wider(index='i', names_from='name')

	i	x_A	y_A	z_A	x_B	y_B	z_B
0	1	3	3	5	[1, 1, 1]	4	[1, 1, 1]
1	2	5	5	7	3	7	3

Complete Working Example:

# pip install pyjanitor
# conda install pyjanitor -c conda-forge
import janitor
import pandas as pd

df = pd.DataFrame({
    'name': ['x', 'y', 'z', 'x', 'y', 'z'],
    'A': [3, 3, 5, 5, 5, 7],
    'B': [[1, 1, 1], '4', [1, 1, 1], '3', '7', '3'],
    'i': [1, 1, 1, 2, 2, 2]
})

out_df = df.pivot_wider(index='i', names_from='name')
print(out_df)

Upvotes: 1

Andrej Kesely

Reputation: 195528

Try:

df = df.set_index("name").stack().to_frame().T
df.columns = df.columns.map("_".join)
print(df)

Prints:

  x_A        x_B y_A y_B z_A        z_B
0   3  [1, 1, 1]   3   4   5  [1, 1, 1]

EDIT: With updated question:

df = df.set_index(["name", "i"]).unstack(level=0).swaplevel(axis=1)
df.columns = df.columns.map("_".join)
print(df.reset_index())

Prints:

   i  x_A  y_A  z_A        x_B y_B        z_B
0  1    3    3    5  [1, 1, 1]   4  [1, 1, 1]
1  2    5    5    7          3   7          3

Upvotes: 2

tozCSS

Reputation: 6124

see https://numpy.org/doc/stable/reference/generated/numpy.ravel.html

from io import StringIO
df = """name    A   B
x   3   [1,1,1]
y   3   4
z   5   [1,1,1]"""
df = pd.read_table(StringIO(df)).set_index('name')
s = pd.Series(df.values.ravel(),
              index=[i+'_'+c for i in df.index for c in df.columns])
s.to_frame().T

	x_A	x_B	y_A	y_B	z_A	z_B
0	3	[1,1,1]	3	4	5	[1,1,1]

Upvotes: 0

Pivot data-frame with list as value in column?

Answers (3)

Related Questions