PharmDataSci
PharmDataSci

Reputation: 117

Reshape dataframe with multiple values columns

I am struggling a bit with some pandas reshaping. I have this dataframe:

+---------+----------+----------+----------+------------+
| Column1 | Sample 1 | Sample 2 | Sample 3 | Identifier |
+---------+----------+----------+----------+------------+
|       11 |      1.0 |      1.1 |      1.3 | ABC        |
|       12 |      2.0 |      1.9 |      2.2 | ABC        |
|       13 |      6.7 |      2.3 |      8.9 | ABC        |
|       11 |      1.0 |      1.1 |      1.3 | DEF        |
|       12 |      2.0 |      1.9 |      2.2 | DEF        |
|       13 |      6.7 |      2.3 |      8.9 | DEF        |
+---------+----------+----------+----------+------------+

And I would like to reshape it to this:

+---------+--------+------------+--------+
| Column1 | Sample | Identifier | Result |
+---------+--------+------------+--------+
|      11 |      1 | ABC        |    1.0 |
|      11 |      2 | ABC        |    1.1 |
|      11 |      3 | ABC        |    1.3 |
|      11 |      1 | DEF        |    1.0 |
|      11 |      2 | DEF        |    1.1 |
|      11 |      3 | DEF        |    1.3 |
|      12 |      1 | ABC        |    2.0 |
|      12 |      2 | ABC        |    1.9 |
|      12 |      3 | ABC        |    2.2 |
|     ... |        |            |        |
+---------+--------+------------+--------+

Essentially, it has been given in a really horrible excel sheet with repeated samples as a collection of columns.

Ideally I want Column1 as the eventual column headers, with Sample and Identifier as the index and Result as values, but I know I need it in the shape above to do that relatively straighforward with df.pivot() (just in case you know a way to go from my source to the eventual destination a bit easier than my way).

Upvotes: 1

Views: 120

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150745

Try melt:

(df.melt(['Column1','Identifier'], value_name='Result', var_name='Sample')
   .assign(Sample=lambda x:x.Sample.str.replace('Sample ',''))
)

Or pd.wide_to_long:

pd.wide_to_long(df, stubnames='Sample ',
                i=['Column1', 'Identifier'],
                j='Sample').reset_index()

Output:

    Column1 Identifier Sample  Result
0        11        ABC      1     1.0
1        12        ABC      1     2.0
2        13        ABC      1     6.7
3        11        DEF      1     1.0
4        12        DEF      1     2.0
5        13        DEF      1     6.7
6        11        ABC      2     1.1
7        12        ABC      2     1.9
8        13        ABC      2     2.3
9        11        DEF      2     1.1
10       12        DEF      2     1.9
11       13        DEF      2     2.3
12       11        ABC      3     1.3
13       12        ABC      3     2.2
14       13        ABC      3     8.9
15       11        DEF      3     1.3
16       12        DEF      3     2.2
17       13        DEF      3     8.9

Upvotes: 3

Scott Boston
Scott Boston

Reputation: 153460

Try this:

df.rename(columns=lambda x: x.split(' ')[-1])\
  .melt(['Column1', 'Identifier'], var_name='Sample', value_name='Result')

Output:

    Column1 Identifier Sample  Result
0        11        ABC      1     1.0
1        12        ABC      1     2.0
2        13        ABC      1     6.7
3        11        DEF      1     1.0
4        12        DEF      1     2.0
5        13        DEF      1     6.7
6        11        ABC      2     1.1
7        12        ABC      2     1.9
8        13        ABC      2     2.3
9        11        DEF      2     1.1
10       12        DEF      2     1.9
11       13        DEF      2     2.3
12       11        ABC      3     1.3
13       12        ABC      3     2.2
14       13        ABC      3     8.9
15       11        DEF      3     1.3
16       12        DEF      3     2.2
17       13        DEF      3     8.9

​Details:

Rename the 'Sample X' to just 'X' using split. Melt the dataframe with 'Column1' and 'Identifier' as the index Rename variable and value names.

Upvotes: 0

Related Questions