BenjaminClayton
BenjaminClayton

Reputation: 37

Pandas re-ordering values in Multi-Index columns on a row-by-row basis

I have the following Multi-Index table:

A B C D
t_1 t_2 t_1 t_2 t_1 t_2 t_1 t_2
x y x y x y x y x y x y x y x y
2.2 5.1 3.4 1.8 1.5 6.7 8.1 7.5 6.1 2.1 9.3 7.1 8.2 1.1 1.4 2.5
7.9 3.2 1.1 5.3 9.3 3.1 0.9 3.2 4.1 5.1 7.7 4.3 8.1 0.4 2.4 4.1

Data Points (x, y) have been randomly assigned to columns A - D. I want to re-order them by the x-value at t_1 - shown in italics. The other values don't matter for re-ordering, but are carried along to their new column by the x-value at t_1. This means each row will be re-ordered differently.

I want some code which processes the above table to produce:

A B C D
t_1 t_2 t_1 t_2 t_1 t_2 t_1 t_2
x y x y x y x y x y x y x y x y
1.5 6.7 8.1 7.5 2.2 5.1 3.4 1.8 6.1 2.1 9.3 7.1 8.2 1.1 1.4 2.5
4.1 5.1 7.7 4.3 7.9 3.2 1.1 5.3 8.1 0.4 2.4 4.1 9.3 3.1 0.9 3.2

Upvotes: 1

Views: 123

Answers (2)

Pygirl
Pygirl

Reputation: 13349

try with unstack and groupby: (only solution I can think of right now)

df1 = df.unstack().unstack()
for col in df1.columns:
    a = []
    for i,g in df1[col].groupby(level=0):
        a.append((i,g.iloc[0]))
    get_sortedli = sorted(a, key=lambda x: x[1])
    order_col = [f1 for f1,f2 in get_sortedli]
    val = (df.iloc[col].reindex(order_col, axis=1, level=0))
    df.iloc[col] = val

df:

enter image description here

I have imagined this as a 4 block arrangement(A,B,C,D) problem. After arranging get the values and assign it to the real dataframe.

df1:

enter image description here

Upvotes: 2

aneroid
aneroid

Reputation: 15962

Here's an option involving mostly meddling with the shape of the data, sorting and then using the re-shaped values and original df columns (a MultiIndex) to create the final dataframe:

df2 = df.T.unstack(level=0).T.reset_index(level=0, col_fill='row')
df2 = df2.sort_values([('level_0', 'row'), ('t_1', 'x')], ignore_index=True)
values = df2.drop(('level_0', 'row'), axis=1).values.reshape(2, -1)
df3 = pd.DataFrame(data=values, columns=df.columns)  # using original df's columns

Output:

                    A                   B                   C                   D
        t_1       t_2       t_1       t_2       t_1       t_2       t_1       t_2
     x    y    x    y    x    y    x    y    x    y    x    y    x    y    x    y
0  1.5  6.7  8.1  7.5  2.2  5.1  3.4  1.8  6.1  2.1  9.3  7.1  8.2  1.1  1.4  2.5
1  4.1  5.1  7.7  4.3  7.9  3.2  1.1  5.3  8.1  0.4  2.4  4.1  9.3  3.1  0.9  3.2

In a more readable but inaccurate table format:

A B C D
t_1 t_2 t_1 t_2 t_1 t_2 t_1 t_2
x y x y x y x y x y x y x y x y
1.5 6.7 8.1 7.5 2.2 5.1 3.4 1.8 6.1 2.1 9.3 7.1 8.2 1.1 1.4 2.5
4.1 5.1 7.7 4.3 7.9 3.2 1.1 5.3 8.1 0.4 2.4 4.1 9.3 3.1 0.9 3.2

Upvotes: 2

Related Questions