Reputation: 165
Imagine I have the following column names for a pyspark dataframe:
Naturally pyspark is ordering them by 0, 1, 2, etc. However, I wanted the following: 0_0; 0_1; 1_0; 1_1; 2_0; 2_1 OR INSTEAD 0_0; 1_0; 2_0; 3_0; 4_0; (...); 0_1; 1_1; 2_1; 3_1; 4_1 (both solutions would be fine by me).
Can anyone help me with this?
Upvotes: 0
Views: 263
Reputation: 42402
You can sort the column names according to the number before and after the underscore:
df2 = df.select(
'id',
*sorted(
df.columns[1:], key=lambda c: (int(c.split('_')[0]), int(c.split('_')[1]))
)
)
To get the other desired output, just swap 0
with 1
in the code above.
Upvotes: 1