Reputation: 435
I have the following data structure:
from collections import OrderedDict
import pandas as pd
d = OrderedDict([
((5, 3, 1), {'y1': 1}),
((5, 3, 2), {'y2': 2}),
((5, 4, 1), {'y1': 10}),
((5, 4, 2), {'y2': 20}),
((6, 3, 1), {'y1': 100}),
((6, 3, 2), {'y2': 200}),
((6, 4, 1), {'y1': 1000}),
((6, 4, 2), {'y2': 2000}),
])
df = pd.DataFrame(
d.values(),
index=pd.MultiIndex.from_tuples(d.keys(), names=['x3', 'x2', 'x1']),
)
The table looks like
y1 y2
x3 x2 x1
5 3 1 1 NaN
2 NaN 2
4 1 10 NaN
2 NaN 20
6 3 1 100 NaN
2 NaN 200
4 1 1000 NaN
2 NaN 2000
As you can see there is a one-to-one mapping between x1 and the columns (x1=1: y1, x1=2: y2) which I want to flatten into
y1 y2
x3 x2
5 3 1 2
4 10 20
6 3 100 200
4 1000 2000
How can I do it?
Edit: Or the other way round:
y
x3 x2 x1
5 3 1 1
2 2
4 1 10
2 20
6 3 1 100
2 200
4 1 1000
2 2000
Upvotes: 2
Views: 228
Reputation: 862591
You can use stack
for remove NaN
, because create Series
, remove third
level by reset_index
and last reshape by unstack
:
print (df.stack().reset_index(level=2,drop=True).unstack(2))
y1 y2
x3 x2
5 3 1.0 2.0
4 10.0 20.0
6 3 100.0 200.0
4 1000.0 2000.0
If need cast to int
add astype
:
print (df.stack().reset_index(level=2,drop=True).unstack(2).astype(int))
y1 y2
x3 x2
5 3 1 2
4 10 20
6 3 100 200
4 1000 2000
EDIT:
print (df.stack().reset_index(level=3,drop=True).to_frame('y').astype(int))
y
x3 x2 x1
5 3 1 1
2 2
4 1 10
2 20
6 3 1 100
2 200
4 1 1000
2 2000
Upvotes: 2
Reputation: 4838
df2 = df.unstack()
df2.columns = range(4)
df3 = df2.drop([1,2], axis=1)
df3.columns = ["Y1", "Y2"]
df3
gives
Upvotes: 0