Reputation: 301
I have a little problem with the transformation from wide to long on a dataset. I tried with melt but I didn't get a good result. I hope that someone could help me. The dataset is as follow:
pd.DataFrame({'id': [0, 1, 2, 3, 4, 5],
'type': ['a', 'b', 'c', 'd', 'e', 'f'],
'rank': ['alpha', 'beta', 'gamma', 'epsilon', 'phi', 'ro'],
'type.1': ['d', 'g', 'z', 'a', 'nan', 'nan'],
'rank.1': ['phi', 'sigma', 'gamma', 'lambda', 'nan', 'nan'],
'type.2': ['nan', 'nan', 'j', 'r', 'nan', 'nan'],
'rank.2': ['nan', 'nan', 'eta', 'theta', 'nan', 'nan']})
And I need the dataset in this way:
pd.DataFrame({'id': [0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 4, 5],
'type': ['a', 'd', 'b', 'g', 'c', 'z', 'j', 'd', 'a', 'r', 'e', 'f'],
'rank': ['alpha', 'phi', 'beta', 'sigma', 'gamma', 'gamma', 'eta', 'epsilon', 'lambda', 'theta', 'phi', 'ro']})
Can anyone help me with that? Thanks a lot
Upvotes: 2
Views: 367
Reputation: 28709
One option is pivot_longer from pyjanitor, which abstracts the reshaping process:
# pip install janitor
import janitor
(df
.pivot_longer(
index = 'id',
names_to = ['type', 'rank'],
names_pattern = ['type', 'rank'],
sort_by_appearance = True)
.loc[lambda df: ~df.eq('nan').any(1)]
)
id type rank
0 0 a alpha
1 0 d phi
3 1 b beta
4 1 g sigma
6 2 c gamma
7 2 z gamma
8 2 j eta
9 3 d epsilon
10 3 a lambda
11 3 r theta
12 4 e phi
15 5 f ro
The idea for this particular reshape is that each regex in names_pattern
is used to pair the matching column with the paired name in names_to
.
Upvotes: 0
Reputation: 215057
Use wide_to_long
:
# normalize the `type` and `rank` columns so they have the same format as others
df = df.rename(columns={'type': 'type.0', 'rank': 'rank.0'})
(pd.wide_to_long(df, stubnames=['type', 'rank'], i='id', j='var', sep='.')
[lambda x: (x['type'] != 'nan') | (x['rank'] != 'nan')].reset_index())
id var type rank
0 0 0 a alpha
1 1 0 b beta
2 2 0 c gamma
3 3 0 d epsilon
4 4 0 e phi
5 5 0 f ro
6 0 1 d phi
7 1 1 g sigma
8 2 1 z gamma
9 3 1 a lambda
10 2 2 j eta
11 3 2 r theta
You can drop the var
column if not needed.
Upvotes: 3