Sort column names in specific order

Question

Imagine I have the following column names for a pyspark dataframe:

Naturally pyspark is ordering them by 0, 1, 2, etc. However, I wanted the following: 0_0; 0_1; 1_0; 1_1; 2_0; 2_1 OR INSTEAD 0_0; 1_0; 2_0; 3_0; 4_0; (...); 0_1; 1_1; 2_1; 3_1; 4_1 (both solutions would be fine by me).

Can anyone help me with this?

mck · Accepted Answer

You can sort the column names according to the number before and after the underscore:

df2 = df.select(
    'id',
    *sorted(
        df.columns[1:], key=lambda c: (int(c.split('_')[0]), int(c.split('_')[1]))
    )
)

To get the other desired output, just swap 0 with 1 in the code above.

Sort column names in specific order

Answers (1)

Related Questions