Reputation: 117
I am struggling a bit with some pandas reshaping. I have this dataframe:
+---------+----------+----------+----------+------------+
| Column1 | Sample 1 | Sample 2 | Sample 3 | Identifier |
+---------+----------+----------+----------+------------+
| 11 | 1.0 | 1.1 | 1.3 | ABC |
| 12 | 2.0 | 1.9 | 2.2 | ABC |
| 13 | 6.7 | 2.3 | 8.9 | ABC |
| 11 | 1.0 | 1.1 | 1.3 | DEF |
| 12 | 2.0 | 1.9 | 2.2 | DEF |
| 13 | 6.7 | 2.3 | 8.9 | DEF |
+---------+----------+----------+----------+------------+
And I would like to reshape it to this:
+---------+--------+------------+--------+
| Column1 | Sample | Identifier | Result |
+---------+--------+------------+--------+
| 11 | 1 | ABC | 1.0 |
| 11 | 2 | ABC | 1.1 |
| 11 | 3 | ABC | 1.3 |
| 11 | 1 | DEF | 1.0 |
| 11 | 2 | DEF | 1.1 |
| 11 | 3 | DEF | 1.3 |
| 12 | 1 | ABC | 2.0 |
| 12 | 2 | ABC | 1.9 |
| 12 | 3 | ABC | 2.2 |
| ... | | | |
+---------+--------+------------+--------+
Essentially, it has been given in a really horrible excel sheet with repeated samples as a collection of columns.
Ideally I want Column1 as the eventual column headers, with Sample and Identifier as the index and Result as values, but I know I need it in the shape above to do that relatively straighforward with df.pivot() (just in case you know a way to go from my source to the eventual destination a bit easier than my way).
Upvotes: 1
Views: 120
Reputation: 150745
Try melt
:
(df.melt(['Column1','Identifier'], value_name='Result', var_name='Sample')
.assign(Sample=lambda x:x.Sample.str.replace('Sample ',''))
)
Or pd.wide_to_long
:
pd.wide_to_long(df, stubnames='Sample ',
i=['Column1', 'Identifier'],
j='Sample').reset_index()
Output:
Column1 Identifier Sample Result
0 11 ABC 1 1.0
1 12 ABC 1 2.0
2 13 ABC 1 6.7
3 11 DEF 1 1.0
4 12 DEF 1 2.0
5 13 DEF 1 6.7
6 11 ABC 2 1.1
7 12 ABC 2 1.9
8 13 ABC 2 2.3
9 11 DEF 2 1.1
10 12 DEF 2 1.9
11 13 DEF 2 2.3
12 11 ABC 3 1.3
13 12 ABC 3 2.2
14 13 ABC 3 8.9
15 11 DEF 3 1.3
16 12 DEF 3 2.2
17 13 DEF 3 8.9
Upvotes: 3
Reputation: 153460
Try this:
df.rename(columns=lambda x: x.split(' ')[-1])\
.melt(['Column1', 'Identifier'], var_name='Sample', value_name='Result')
Output:
Column1 Identifier Sample Result
0 11 ABC 1 1.0
1 12 ABC 1 2.0
2 13 ABC 1 6.7
3 11 DEF 1 1.0
4 12 DEF 1 2.0
5 13 DEF 1 6.7
6 11 ABC 2 1.1
7 12 ABC 2 1.9
8 13 ABC 2 2.3
9 11 DEF 2 1.1
10 12 DEF 2 1.9
11 13 DEF 2 2.3
12 11 ABC 3 1.3
13 12 ABC 3 2.2
14 13 ABC 3 8.9
15 11 DEF 3 1.3
16 12 DEF 3 2.2
17 13 DEF 3 8.9
Details:
Rename the 'Sample X' to just 'X' using split. Melt the dataframe with 'Column1' and 'Identifier' as the index Rename variable and value names.
Upvotes: 0