Reputation: 3608
I have a dataframe 6k columns wide, of the format:
import pandas as pd
df = pd.DataFrame([('jan 1 2000','a','b','c',1,2,3,'aa','bb','cc'), ('jan 2 2000','d', 'e', 'f', 4, 5, 6, 'dd', 'ee', 'ff')],
columns=['date','a_1', 'a_2', 'a_3','b_1', 'b_2', 'b_3','c_1', 'c_2', 'c_3'])
df
date a_1 a_2 a_3 b_1 b_2 b_3 c_1 c_2 c_3
0 jan 1 2000 a b c 1 2 3 aa bb cc
1 jan 2 2000 d e f 4 5 6 dd ee ff
I want:
I have looked at: Pandas Melt several groups of columns into multiple target columns by name and Pandas: Multiple columns into one column but am unable to form a correct solution.
Any suggestions are appreciated
Upvotes: 3
Views: 1442
Reputation: 28644
One option is the pivot_longer function from pyjanitor, using the .value
placeholder:
# pip install pyjanitor
import pandas as pd
import janitor
df.pivot_longer(
index = 'date',
names_to = ('ID', '.value'),
names_sep='_',
sort_by_appearance=True)
date ID 1 2 3
0 jan 1 2000 a a b c
1 jan 1 2000 b 1 2 3
2 jan 1 2000 c aa bb cc
3 jan 2 2000 a d e f
4 jan 2 2000 b 4 5 6
5 jan 2 2000 c dd ee ff
Upvotes: 0
Reputation: 862681
Create MultiIndex
in columns with split
and reshape by DataFrame.stack
by first level:
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
df.columns = df.columns.str.split('_', expand=True)
df = df.stack(0).rename_axis(('date', 'ID')).reset_index()
print (df)
date ID 1 2 3
0 2000-01-01 a a b c
1 2000-01-01 b 1 2 3
2 2000-01-01 c aa bb cc
3 2000-01-02 a d e f
4 2000-01-02 b 4 5 6
5 2000-01-02 c dd ee ff
Upvotes: 4
Reputation: 153460
Use pd.wide_to_long
and some dataframe reshaping.
pd.wide_to_long(df, ['a','b','c'], 'date', 'ID', '_')\
.rename_axis('ID', axis=1)\
.stack()\
.unstack(1)\
.reset_index()
Output:
ID date ID 1 2 3
0 jan 1, 2000 a a b c
1 jan 1, 2000 b 1 2 3
2 jan 1, 2000 c aa bb cc
3 jan 2, 2000 a d e f
4 jan 2, 2000 b 4 5 6
5 jan 2, 2000 c dd ee ff
Where df is:
df = pd.DataFrame([('jan 1, 2000','a','b','c',1,2,3,'aa','bb','cc'), ('jan 2, 2000','d', 'e', 'f', 4, 5, 6, 'dd', 'ee', 'ff')],
columns=['date','a_1', 'a_2', 'a_3','b_1', 'b_2', 'b_3','c_1', 'c_2', 'c_3'])
df
Input df:
date a_1 a_2 a_3 b_1 b_2 b_3 c_1 c_2 c_3
0 jan 1, 2000 a b c 1 2 3 aa bb cc
1 jan 2, 2000 d e f 4 5 6 dd ee ff
Upvotes: 5