Reputation: 1029
I have a requirement to change column positions frequently. instead of changing the code i created a temporary dataframe Index_df
. here i will update the column positions and it should reflect on actual dataframe on which the changes should perform.
sample_df
F_cDc,F_NHY,F_XUI,F_NMY,P_cDc,P_NHY,P_XUI,P_NMY
415 258 854 245 478 278 874 235
405 197 234 456 567 188 108 267
315 458 054 375 898 978 677 134
Index_df
col position
F_cDc,1
F_NHY,3
F_XUI,5
F_NMY,7
P_cDc,2
P_NHY,4
P_XUI,6
P_NMY,8
here according to the index_df
,sample_df
should change.
Expected output:
F_cDc,P_cDc,F_NHY,P_NHY,F_XUI,P_XUI,F_NMY,P_NMY
415 478 258 278 854 874 245 235
405 567 197 188 234 108 456 267
315 898 458 978 054 677 375 134
here column positions are changed according to the positions i have updated in Index_df
I could do sample_df.select("<column order>")
but i have more than 70 columns. Technically which is not a best way to deal.
Upvotes: 0
Views: 649
Reputation: 15258
You can easily achieve that with select
.
First, you retrieve your columns in the right order :
NewColList = Index_df.orderBy("position").select("col").collect()
Then you apply your new order to your df
sample_df = sample_df.select(*[i[0] for i in NewColList])
Upvotes: 6