Reputation: 594

Pandas reshape sets of columns as rows

I have a dataframe that looks roughly like this:

A1 B1 C1 A4 B4 C4 A7 B7 C7
A2 B2 C2 A5 B5 C5 A8 B8 C8
A3 B3 C3 A6 B6 C6 A9 B9 C9

that I would like to get looking like this:

A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
A5 B5 C5
A6 B6 C6
A7 B7 C7
A8 B8 C8
A9 B9 C9

Is there anything built into pandas or another data processing library that can easily do this without manually traversing the rows 3 (in this example) times for each "column set"? This would essentially be a 3-column pivot.

Upvotes: 3

Answers (4)

manwithfewneeds

Reputation: 1167

You could reconstruct the df:

import pandas as pd
from itertools import chain

letters = sorted(set(j for i in chain(*df.values) for j in i if j.isalpha()))
v = {letter: sorted(i for i in chain(*df.values) if i.startswith(letter)) for letter in letters}

dff = pd.DataFrame(v)
print(dff)

    A   B   C
0  A1  B1  C1
1  A2  B2  C2
2  A3  B3  C3
3  A4  B4  C4
4  A5  B5  C5
5  A6  B6  C6
6  A7  B7  C7
7  A8  B8  C8
8  A9  B9  C9

Upvotes: 0

jezrael

Reputation: 862671

Use DataFrame.stack with MultiIndex created by modulo and integer division:

c = np.arange(len(df.columns))

df.columns = [c // 3, c % 3]
df1 = df.stack(0).sort_index(level=1).reset_index(drop=True)
print (df1)
RangeIndex(start=0, stop=3, step=1)
    0   1   2
0  A1  B1  C1
1  A2  B2  C2
2  A3  B3  C3
3  A4  B4  C4
4  A5  B5  C5
5  A6  B6  C6
6  A7  B7  C7
7  A8  B8  C8
8  A9  B9  C9

Upvotes: 1

Adam

Reputation: 46

I'm not really experienced in pandas so I don't know the exact syntax. But you could split the original dataframe into 3 chunks and then re-concatenate into your desired dataframe along the 1st axis.

So it could be separated into

A1 B1 C1
A2 B2 C2
A3 B3 C3

A4 B4 C4
A5 B5 C5
A6 B6 C6

A7 B7 C7
A8 B8 C8
A9 B9 C9

Upvotes: 0

user3483203

Reputation: 51155

`reshape` + `swapaxes` + `reshape`

df.values.reshape(-1, 3, 3).swapaxes(1, 0).reshape(-1, 3)

array([['A1', 'B1', 'C1'],
       ['A2', 'B2', 'C2'],
       ['A3', 'B3', 'C3'],
       ['A4', 'B4', 'C4'],
       ['A5', 'B5', 'C5'],
       ['A6', 'B6', 'C6'],
       ['A7', 'B7', 'C7'],
       ['A8', 'B8', 'C8'],
       ['A9', 'B9', 'C9']], dtype=object)

To expand this and make it more general, you can calculate your offsets based on your grouping, For example, let's say group every 4 columns in the following Frame:

A1 B1 C1 D1 A4 B4 C4 D4 A7 B7 C7 D7
A2 B2 C2 D2 A5 B5 C5 D5 A8 B8 C8 D8
A3 B3 C3 D3 A6 B6 C6 D6 A9 B9 C9 D9

n = 4
f = df.shape[1] // n

df.values.reshape(-1, f, n).swapaxes(1, 0).reshape(-1, n)

array([['A1', 'B1', 'C1', 'D1'],
       ['A2', 'B2', 'C2', 'D2'],
       ['A3', 'B3', 'C3', 'D3'],
       ['A4', 'B4', 'C4', 'D4'],
       ['A5', 'B5', 'C5', 'D5'],
       ['A6', 'B6', 'C6', 'D6'],
       ['A7', 'B7', 'C7', 'D7'],
       ['A8', 'B8', 'C8', 'D8'],
       ['A9', 'B9', 'C9', 'D9']], dtype=object)

Using the underlying array is going to be quite a fast approach.

df = pd.concat([df]*500)

In [128]: %%timeit
     ...: n = 3
     ...: f = df.shape[1] // n
     ...: df.values.reshape(-1, f, n).swapaxes(1, 0).reshape(-1, n)
     ...:
77.4 µs ± 417 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [129]: %%timeit
     ...: c = np.arange(len(df.columns))
     ...: df.columns = [c // 3, c % 3]
     ...: df1 = df.stack(0).sort_index(level=1).reset_index(drop=True)
     ...:
     ...:
12.2 ms ± 326 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Upvotes: 2

Pandas reshape sets of columns as rows

Answers (4)

reshape + swapaxes + reshape

Related Questions

`reshape` + `swapaxes` + `reshape`