Reputation: 3091
I wanted to count the number of items for each sale_id
and decided to use a count function. The idea was to have item_numbers
as the last column and not to affect the original columns ordering from salesDf
.
But after the join sale_id
column became the first one in df3
. So in order to fix this I tried .select(salesDf.schema.fieldNames.map(col):_*)
However after that item_numbers
column is missing (while other columns ordering is correct).
How do I preserve the correct ordering leaving item_numbers
column in place at the same time?
val df2 = salesDf.groupBy("sale_id").agg(count("item_id").as("item_numbers"))
val df3 = salesDf.join(df2, "sale_id").select(salesDf.schema.fieldNames.map(col):_*)
Upvotes: 0
Views: 610
Reputation: 22439
To preserve salesDf
's column order in the final result, you could assemble the column list for select
as follows:
val df2 = salesDf.groupBy("sale_id").agg(count("item_id").as("item_numbers"))
val df3 = salesDf.join(df2, "sale_id")
val orderedCols = salesDf.columns :+ "item_numbers"
val resultDF = df3.select(orderedCols.map(col): _*)
Upvotes: 1