Aditya Seth
Aditya Seth

Reputation: 53

How can I add sequence of string as column on dataFrame and make as transforms

I have a sequence of string

val listOfString : Seq[String] = Seq("a","b","c")

How can I make a transform like

def addColumn(example: Seq[String]): DataFrame => DataFrame {
some code which returns a transform which add these String as column to dataframe
}
input
+-------
| id                      
+-------
|  1     
+-------
output 
+-------+-------+----+-------
| id    |    a  |  b |    c                   
+-------+-------+----+-------
|  1    |  0    |  0 |    0     
+-------+-------+----+-------

I am only interested in making it as transform

Upvotes: 1

Views: 1577

Answers (2)

abiratsis
abiratsis

Reputation: 7336

You can use the transform method of the datasets together with a single select statement:

import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.lit

def addColumns(extraCols: Seq[String])(df: DataFrame): DataFrame = {
  val selectCols = df.columns.map{col(_)} ++ extraCols.map{c => lit(0).as(c)}
  df.select(selectCols :_*)
}


// usage example
val yourExtraColumns : Seq[String] = Seq("a","b","c")

df.transform(addColumns(yourExtraColumns))

Resources

https://towardsdatascience.com/dataframe-transform-spark-function-composition-eb8ec296c108

https://mungingdata.com/apache-spark/chaining-custom-dataframe-transformations/

Upvotes: 1

notNull
notNull

Reputation: 31540

Use .toDF() and pass your listOfString.

Example:

//sample dataframe
df.show()
//+---+---+---+
//| _1| _2| _3|
//+---+---+---+
//|  0|  0|  0|
//+---+---+---+


df.toDF(listOfString:_*).show()
//+---+---+---+
//|  a|  b|  c|
//+---+---+---+
//|  0|  0|  0|
//+---+---+---+

UPDATE:

Use foldLeft to add the columns to the existing dataframe with values.

val df=Seq(("1")).toDF("id")

val listOfString : Seq[String] = Seq("a","b","c")

val new_df=listOfString.foldLeft(df){(df,colName) => df.withColumn(colName,lit("0"))}
//+---+---+---+---+
//| id|  a|  b|  c|
//+---+---+---+---+
//|  1|  0|  0|  0|
//+---+---+---+---+

//or creating a function 
import org.apache.spark.sql.DataFrame

def addColumns(extraCols: Seq[String],df: DataFrame): DataFrame = {
  val new_df=extraCols.foldLeft(df){(df,colName) => df.withColumn(colName,lit("0"))}
  return new_df
}

addColumns(listOfString,df).show()
//+---+---+---+---+
//| id|  a|  b|  c|
//+---+---+---+---+
//|  1|  0|  0|  0|
//+---+---+---+---+

Upvotes: 1

Related Questions