jest jest
jest jest

Reputation: 135

Spark Dataframes: How can I change the order of columns in Java/Scala?

After joining two dataframes, I find that the column order has changed what I supposed it would be.

Ex: Joining two data frames with columns [b,c,d,e] and [a,b] on b yields a column order of [b,a,c,d,e].

How can I change the order of the columns (e.g., [a,b,c,d,e])? I've found ways to do it in Python/R but not Scala or Java. Are there any methods that allow swapping or reordering of dataframe columns?

Upvotes: 4

Views: 12744

Answers (2)

chucknelson
chucknelson

Reputation: 2336

In Scala you can use the "splat" (:_*) syntax to pass a variable length list of columns to the DataFrame.select() method.

To address your example, you can get a list of the existing columns via DataFrame.columns, which returns an array of strings. Then just sort that array and convert the values to columns. You can then "splat" out to the select() method:

val mySortedCols = myDF.columns.sorted.map(str => col(str))
// Array[String]=(b,a,c,d,e) => Array[Column]=(a,b,c,d,e)

val myNewDF = myDF.select(mySortedCols:_*)

Upvotes: 8

Kestemont Max
Kestemont Max

Reputation: 1422

One way of doing it is reordering after your join:

case class Person(name : String, age: Int)
val persons = Seq(Person("test", 10)).toDF

persons.show
+----+---+
|name|age|
+----+---+
|test| 10|
+----+---+

persons.select("age", "name").show

+---+----+
|age|name|
+---+----+
| 10|test|
+---+----+

Upvotes: 2

Related Questions