Reputation: 53
I have a data-frame like this:
name | A | B | i |
---|---|---|---|
x | 3 | [1,1,1] | 1 |
y | 3 | 4 | 1 |
z | 5 | [1,1,1] | 1 |
x | 5 | 3 | 2 |
y | 5 | 7 | 2 |
z | 7 | 3 | 2 |
What I want is this:
x_A | x_B | y_A | y_B | z_A | z_B | i |
---|---|---|---|---|---|---|
3 | [1,1,1] | 3 | 4 | 5 | [1,1,1] | 1 |
5 | 3 | 5 | 7 | 3 | 3 | 2 |
My code so far looks like this:
df = df.pivot_table(column= 'name', values =['A','B'])
df2 = df.unstack().to_frame().T
df2.columns = df2.columns.map('_'.join)
However, when I run this, it seems to skip the columns that have lists (ie. column B) and gives me:
x_A | y_A | z_A |
---|---|---|
3 | 3 | 5 |
Is there another way to solve this? Am I missing something? TIA.
Upvotes: 1
Views: 123
Reputation: 35676
We can use pivot_table
with aggfunc
of first
(to handle object types like list) and sort_index
to group level 1 keys together. Then collapse the MultiIndex with Index.swaplevel
and Index.map
. Lastly, return i
to the columns with DataFrame.reset_index
:
out_df = (
df.pivot_table(
index='i',
columns='name',
aggfunc='first'
).sort_index(axis=1, level=1)
)
out_df.columns = out_df.columns.swaplevel().map('_'.join)
out_df = out_df.reset_index()
out_df
:
i | x_A | x_B | y_A | y_B | z_A | z_B | |
---|---|---|---|---|---|---|---|
0 | 1 | 3 | [1, 1, 1] | 3 | 4 | 5 | [1, 1, 1] |
1 | 2 | 5 | 3 | 5 | 7 | 7 | 3 |
Setup:
import pandas as pd
df = pd.DataFrame({
'name': ['x', 'y', 'z', 'x', 'y', 'z'],
'A': [3, 3, 5, 5, 5, 7],
'B': [[1, 1, 1], '4', [1, 1, 1], '3', '7', '3'],
'i': [1, 1, 1, 2, 2, 2]
})
The pyjanitor module has an abstraction for this operation called pivot_wider
which simplifies this transformation to:
out_df = df.pivot_wider(index='i', names_from='name')
i | x_A | y_A | z_A | x_B | y_B | z_B | |
---|---|---|---|---|---|---|---|
0 | 1 | 3 | 3 | 5 | [1, 1, 1] | 4 | [1, 1, 1] |
1 | 2 | 5 | 5 | 7 | 3 | 7 | 3 |
Complete Working Example:
# pip install pyjanitor
# conda install pyjanitor -c conda-forge
import janitor
import pandas as pd
df = pd.DataFrame({
'name': ['x', 'y', 'z', 'x', 'y', 'z'],
'A': [3, 3, 5, 5, 5, 7],
'B': [[1, 1, 1], '4', [1, 1, 1], '3', '7', '3'],
'i': [1, 1, 1, 2, 2, 2]
})
out_df = df.pivot_wider(index='i', names_from='name')
print(out_df)
Upvotes: 1
Reputation: 195528
Try:
df = df.set_index("name").stack().to_frame().T
df.columns = df.columns.map("_".join)
print(df)
Prints:
x_A x_B y_A y_B z_A z_B
0 3 [1, 1, 1] 3 4 5 [1, 1, 1]
EDIT: With updated question:
df = df.set_index(["name", "i"]).unstack(level=0).swaplevel(axis=1)
df.columns = df.columns.map("_".join)
print(df.reset_index())
Prints:
i x_A y_A z_A x_B y_B z_B
0 1 3 3 5 [1, 1, 1] 4 [1, 1, 1]
1 2 5 5 7 3 7 3
Upvotes: 2
Reputation: 6124
see https://numpy.org/doc/stable/reference/generated/numpy.ravel.html
from io import StringIO
df = """name A B
x 3 [1,1,1]
y 3 4
z 5 [1,1,1]"""
df = pd.read_table(StringIO(df)).set_index('name')
s = pd.Series(df.values.ravel(),
index=[i+'_'+c for i in df.index for c in df.columns])
s.to_frame().T
x_A | x_B | y_A | y_B | z_A | z_B | |
---|---|---|---|---|---|---|
0 | 3 | [1,1,1] | 3 | 4 | 5 | [1,1,1] |
Upvotes: 0