Alice
Alice

Reputation: 175

Rename column names of a dataframe with respect to another dataframe using scala

I am trying to rename the columns of a data frame based on another dataframe. How can i achieve this using Scala?

Essentially my data looks like

DataFrame1

A    B    C   D
1    2    3   4

I have another table that looks like this DataFrame2

Col1    Col2
A       E
B       Q
C       R
D       Z

I want to rename the columns of my first data frame with respect to other dataframe. so that expected output should look like this:

E    Q    R    Z
1    2    3    4

I have tried the code using PySpark (copied from this answer by user8371915) and this is working fine:

name_dict = dataframe2.rdd.collectAsMap()

dataframe1.select([dataframe[c].alias(name_dict.get(c, c)) for c in dataframe1.columns]).show()

Now, how can i achieve this using Scala?

Upvotes: 0

Views: 2619

Answers (2)

Anurag Sharma
Anurag Sharma

Reputation: 2605

For spark 1.6 as required

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.col

object ColumnNameChange {
  def main(args: Array[String]): Unit = {

    val spark = SparkSession
      .builder()
      .appName("SparkSessionExample")
      .config("spark.master", "local")
      .getOrCreate()

    import spark.implicits._

    val df1 = Seq((1, 2, 3, 4)).toDF("A","B","C","D")
    val df2 = Seq(("A", "E"),("B","Q"), ("C", "R"),("D","Z")).toDF("Col1","Col2")


    val name_dict : scala.collection.Map[String,String] = df2.map(row => { row.getAs[String]("Col1") -> row.getAs[String]("Col2") }).collectAsMap()

    val df3 = df1.select(df1.columns.map(c => col(c).as(name_dict.getOrElse(c, c))): _*)
    df3.show()


  }

}

Upvotes: 2

Pratyush Sharma
Pratyush Sharma

Reputation: 289

You can do it this way too (df1 and df2 same as in @AnuragSharma answer):

val spark: SparkSession = ???
import spark.implicits._

val to  = df1.columns.toSeq.toDF.join(df2, $"value" === df2("Col1"))
  .select("Col2")
  .collect.map(row => (row.getString(0))).toList

val newDF = df1.toDF(to: _*)

newDF.show()

// +---+---+---+---+
// |  E|  Q|  R|  Z|
// +---+---+---+---+
// |  1|  2|  3|  4|
// +---+---+---+---+

Upvotes: 2

Related Questions