Reputation: 391
Original DataFrame:
+----+----------+----------+----------+----------+
| ID | var1hrs | var2hrs | ind1var | ind2var |
+----+----------+----------+----------+----------+
| 1 | 55 | 45 | 123 | 456 |
| 2 | 48 | 60 | 331 | 222 |
+----+----------+----------+----------+----------+
Target DataFrame:
+----+------------+------+------+
| ID | type | hrs | ind |
+----+------------+------+------+
| 1 | primary | 55 | 123 |
| 1 | secondary | 45 | 456 |
| 2 | primary | 48 | 331 |
| 2 | secondary | 60 | 222 |
+----+------------+------+------+
How would I go about melting multiple groups of variables into a single label column? The "1" in the variable names indicate type = "primary" and "2" indicates type = "secondary".
Upvotes: 3
Views: 1351
Reputation: 402363
(Comments inlined)
# set ID as the index and sort columns
df = df.set_index('ID').sort_index(axis=1)
# extract primary columns
prim = df.filter(like='1')
prim.columns = ['ind', 'vars']
# extract secondary columns
sec = df.filter(like='2')
sec.columns = ['ind', 'vars']
# concatenation + housekeeping
v = (pd.concat([prim, sec], keys=['primary', 'secondary'])
.swaplevel(0, 1)
.rename_axis(['ID', 'type'])
.reset_index()
)
print(v)
ID type ind vars
0 1 primary 123 55
1 2 primary 331 48
2 1 secondary 456 45
3 2 secondary 222 60
This is more or less one efficient way of doing it, even if the steps are a bit involved.
Upvotes: 4
Reputation: 323226
After modify the columns' name, we can using wide_to_long
df.columns=df.columns.str[:4]
s=pd.wide_to_long(df,['var','ind'],i='ID',j='type').reset_index()
s=s.assign(type=s.type.map({'1':'primary','2':'secondary'})).sort_values('ID')
s
ID type var ind
0 1 primary 55 123
2 1 secondary 45 456
1 2 primary 48 331
3 2 secondary 60 222
Upvotes: 5