Pandas long reshape for several variables

Question

I want to reshape my long dataframe to wide by sorting it by Session. In this example Session is from 1-10.

       Session  Tube  Window  Counts  Length
0            1     1       1     0.0     0.0
1            1     1       2     0.0     0.0
2            1     1       3     0.0     0.0
3            1     1       4     0.0     0.0
4            1     1       5     0.0     0.0
...        ...   ...     ...     ...     ...
17995       10    53      36     0.0     0.0
17996       10    53      37     0.0     0.0
17997       10    53      38     0.0     0.0
17998       10    53      39     0.0     0.0
17999       10    53      40     0.0     0.0

What I am expecting is something like:

       Session  Tube  Window  Counts_1  Length_1           Session   Counts_2  Length_2
0            1     1       1     0.0     0.0    0                 2      0.0     0.0
1            1     1       2     0.0     0.0    1                 2      0.0     0.0
2            1     1       3     0.0     0.0    2                 2      0.0     0.0
3            1     1       4     0.0     0.0    3                 2      0.0     0.0
4            1     1       5     0.0     0.0    4                 2      0.0     0.0
...        ...   ...     ...     ...     ...    ...           ...   ...     ...     ...     ...
17995       10    53      36     0.0     0.0

I could not find the solution. What I tried leads to a complete wide dataset.

df['idx'] = df.groupby('Session').cumcount()+1
df = df.pivot_table(index=['Session'], columns='idx', 
                    values=['Counts', 'Length'], aggfunc='first')
df = df.sort_index(axis=1, level=1)
df.columns = [f'{x}_{y}' for x,y in df.columns]
df = df.reset_index()

   Session  Counts_1   Length_1  Counts_2   Length_2  Counts_3   Length_3  Counts_4   Length_4  Counts_5   Length_5  ...  Length_1795  Counts_1796  Length_1796  Counts_1797  Length_1797  Counts_1798  Length_1798  Counts_1799  Length_1799  Counts_1800  Length_1800
0        1       0.0   0.000000       0.0   0.000000       0.0   0.000000       0.0   0.000000       0.0   0.000000  ...     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000
1        2       0.0   0.000000       0.0   0.000000       0.0   0.000000       0.0   0.000000       0.0   0.000000  ...     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000
2        3       0.0   6.892889       0.0   2.503830       0.0   3.108580       0.0   5.188438       0.0   9.779242  ...     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000
3        4       1.0  12.787159       0.0  13.847412       7.0  44.928269       0.0  48.511435       2.0  33.264356  ...     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000
4        5       0.0  13.345436       2.0  27.415005      20.0  83.130315      19.0  85.475996       2.0  10.147958  ...     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000
5        6       2.0  13.141503       8.0  22.965002       5.0  48.737279      15.0  85.403915       1.0  17.414609  ...     0.000000          6.0    12.399834          0.0     0.710808          0.0     0.000000          0.0     1.661978          0.0     0.000000
6        7       1.0   7.852842       0.0  13.613426      14.0  46.148978      23.0  87.446535       0.0  13.759176  ...     2.231295          8.0    39.022340          1.0     7.304392          3.0     9.228959          0.0     6.885822          0.0     1.606200
7        8       0.0   0.884018       3.0  35.323813       8.0  32.846301      10.0  71.691744       0.0   4.310296  ...     2.753615          6.0    25.003670          6.0    22.113324          0.0     0.615790          0.0    11.812815          2.0     9.991712
8        9       4.0  24.700817      13.0  31.637755       3.0  30.312104       5.0  50.490115       0.0   3.830024  ...     5.977912         11.0    44.305738          1.0    13.523643          0.0     1.374856          1.0     9.066218          1.0     8.376995
9       10       0.0  17.651236      10.0  44.311858      29.0  55.415964      12.0  43.457016       1.0  41.503212  ...     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000

Serge Ballesta · Accepted Answer

You could try to pivot your dataframe, after building a custom index per session:

df2 = df.assign(index=df.groupby(['Session']).cumcount()).pivot(
    'index', 'Session', ['Tube',  'Window', 'Counts', 'Length']).rename_axis(index=None)

With you sample data it would give:

        Tube       Window       Counts      Length     
Session   1     10     1     10     1    10     1    10
0        1.0  53.0    1.0  36.0    0.0  0.0    0.0  0.0
1        1.0  53.0    2.0  37.0    0.0  0.0    0.0  0.0
2        1.0  53.0    3.0  38.0    0.0  0.0    0.0  0.0
3        1.0  53.0    4.0  39.0    0.0  0.0    0.0  0.0
4        1.0  53.0    5.0  40.0    0.0  0.0    0.0  0.0

Not that bad but we have a MultiIndex for the columns and in a wrong order. Let us go further:

df2.columns = df2.columns.to_flat_index()
df2 = df2.reindex(columns=sorted(df2.columns, key=lambda x: x[1]))

We now have:

   (Tube, 1)  (Window, 1)  ...  (Counts, 10)  (Length, 10)
0        1.0          1.0  ...           0.0           0.0
1        1.0          2.0  ...           0.0           0.0
2        1.0          3.0  ...           0.0           0.0
3        1.0          4.0  ...           0.0           0.0
4        1.0          5.0  ...           0.0           0.0

Last step:

df2 = df2.rename(columns=lambda x: '_'.join(str(i) for i in x))

to finaly get:

   Tube_1  Window_1  Counts_1  ...  Window_10  Counts_10  Length_10
0     1.0       1.0       0.0  ...       36.0        0.0        0.0
1     1.0       2.0       0.0  ...       37.0        0.0        0.0
2     1.0       3.0       0.0  ...       38.0        0.0        0.0
3     1.0       4.0       0.0  ...       39.0        0.0        0.0
4     1.0       5.0       0.0  ...       40.0        0.0        0.0

Pandas long reshape for several variables

Answers (1)

Related Questions