getaway22
getaway22

Reputation: 199

rename column name of spark data frame based on csv

I've trouble renaming the header of a dataframe based on a csv.

I got the following data frame: df1:

Att1   Att2     Att3   
23      m        0      
22      m        1      
42      f        0   
32      f        0    
45      m        1    

Now I want to change the column names (first row) based on a csv file, which looks like this:

Att1,age
Att2,gender      
Att3,employed 
...,...    
Att99,colnameY     
Att100,colnameZ

As a result I expect a data frame, witch looks like this:

age   gender    employed   
23      m        0      
22      m        1      
42      f        0   
32      f        0    
45      m        1    

any ideas? Thank you for your help :)

Upvotes: 1

Views: 1904

Answers (1)

akuiper
akuiper

Reputation: 214957

import scala.io.Source.fromFile

// read in the names map from old names to new names
val map = fromFile("names.csv").getLines.map(line => {
    val fields = line.split(",")
    (fields(0), fields(1)) 
}).toMap
// map: scala.collection.immutable.Map[String,String] = Map(Att1 -> age, Att2 -> gender, Att3 -> employed)

// rename columns using withColumnRenamed
df1.columns.foldLeft(df1){ 
    case (df, col) => df.withColumnRenamed(col, map.getOrElse(col, col)) 
}.show
+---+------+--------+
|age|gender|employed|
+---+------+--------+
| 23|     m|       0|
| 22|     m|       1|
| 42|     f|       0|  
| 32|     f|       0|
| 45|     m|       1|
+---+------+--------+

Upvotes: 2

Related Questions