rwb
rwb

Reputation: 4478

Convert data in "grouped" columns from long to wide / square format pd.DataFrame

Not quite sure how to describe what I'm looking for, so hopefully an example will help.

enter image description here

So, for example, in the first row (index=0), there are two observations, at id=5 (x=a, y=0) and at id=9 (x=a, y=1)

I want to reshape / pivot the data such that there is a column for every location with the corresponding x, y values (can be NA).

In this example, that would look like this

enter image description here

Here, looking at the first row again, (id=5, x=a, y=0) has translated -> (x_5=a, y_5=0) and (id=9, x=a, y=1) -> (x_9=a, y_9=1). There are no observations for any other locations. So those are <NA>

Have been trying all combinations of .pivot I can think of but can't manage it.

MRE to produce the example DataFrame's:

import pandas as pd

df_before = pd.DataFrame(
    [
        [5, 'a', '0', 9, 'a', '1'],
        [5, 'b', '3', 6, 'c', '3'],
        [9, 'c', '4'],
        [2, 'd', '12', 6, 'a', '3', 9, 'c', '2', 4, 'a', '3'],
        [6, 'a', '3'],
        [5, 'b', '0', 9, 'a', '4', 4, 'd', '9'],
        [2, 'c', '2', 9, 'g', '5', 4, 'g', '8']
    ],
    columns=['id_0','x_0','y_0', 'id_1', 'x_1', 'y_1', 'id_2', 'x_2', 'y_2', 'id_3', 'x_3', 'y_3']
).fillna(pd.NA)

df_after = pd.DataFrame(
    [
        [pd.NA, pd.NA, pd.NA, pd.NA, 'a', '0', pd.NA, pd.NA, 'a', '1'],
        [pd.NA, pd.NA, pd.NA, pd.NA, 'b', '3', 'c', '3', pd.NA, pd.NA],
        [pd.NA, pd.NA, pd.NA, pd.NA, pd.NA, pd.NA, pd.NA, pd.NA, 'c', '4'],
        ['d', '12', 'a', '3', pd.NA, pd.NA, 'a', '3', 'c', '2'],
        [pd.NA, pd.NA, pd.NA, pd.NA, pd.NA, pd.NA, 'a', '3', pd.NA, pd.NA],
        [pd.NA, pd.NA, 'd', '9', 'b', '0', pd.NA, pd.NA, 'a', '4'],
        ['c', '2', 'g', '8', pd.NA, pd.NA, pd.NA, pd.NA, 'g', '5']
    ],
    columns=['x_2', 'y_2', 'x_4', 'y_4', 'x_5', 'y_5', 'x_6', 'y_6', 'x_9', 'y_9']
)

Upvotes: 2

Views: 74

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195418

Try:

def fn(x):
    m = x.notna() & ~x.index.str.startswith("id")
    return {
        f"{a}_" + str(int(x[f"id_{b}"])): v
        for (a, b), v in zip(map(lambda v: v.split("_"), x[m].index), x[m])
    }

x = pd.DataFrame(df_before.apply(fn, axis=1).to_list())
print(x[sorted(x.columns, key=lambda v: int(v.split("_")[1]))])

Prints:

   x_2  y_2  x_4  y_4  x_5  y_5  x_6  y_6  x_9  y_9
0  NaN  NaN  NaN  NaN    a    0  NaN  NaN    a    1
1  NaN  NaN  NaN  NaN    b    3    c    3  NaN  NaN
2  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN    c    4
3    d   12    a    3  NaN  NaN    a    3    c    2
4  NaN  NaN  NaN  NaN  NaN  NaN    a    3  NaN  NaN
5  NaN  NaN    d    9    b    0  NaN  NaN    a    4
6    c    2    g    8  NaN  NaN  NaN  NaN    g    5

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150735

Your operation is essentially melt/wide_to_long, and pivot back:

out = (pd.wide_to_long(df_before.reset_index(), 
                stubnames=['id_','x_','y_'],
                i='index', 
                j='old_id'
                )
       .groupby(['index','id_']).first().unstack('id_')
       .sort_index(level=[1,0], axis=1)
)
out.columns = [f'{x}{int(y)}' for x,y in out.columns]

Output:

       x_2  y_2  x_4  y_4  x_5  y_5  x_6  y_6  x_9  y_9
index                                                  
0      NaN  NaN  NaN  NaN    a    0  NaN  NaN    a    1
1      NaN  NaN  NaN  NaN    b    3    c    3  NaN  NaN
2      NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN    c    4
3        d   12    a    3  NaN  NaN    a    3    c    2
4      NaN  NaN  NaN  NaN  NaN  NaN    a    3  NaN  NaN
5      NaN  NaN    d    9    b    0  NaN  NaN    a    4
6        c    2    g    8  NaN  NaN  NaN  NaN    g    5

Upvotes: 2

Related Questions