Holmes
Holmes

Reputation: 1079

Creating Empty DF and adding column is NOT working

I'm trying to create a empty dataframe and append new column. I tried to do this by two option. Option A is working but Option B is not working. Please help!

Option A:

`

var initialDF1 = Seq(("test")).toDF("M") 
initialDF1 = initialDF1.withColumn(("P"), lit(s"P"))
initialDF1.show
+----+---+
|   M|  P|
+----+---+
|test|  P|
+----+---+

`

Option B: (Not working)

`

import org.apache.spark.sql.types.{StructType, StructField, StringType}
import org.apache.spark.sql.Row
val schema = StructType(List(StructField("N", StringType, true)))
var initialDF = spark.createDataFrame(sc.emptyRDD[Row], schema)
initialDF = initialDF.withColumn(("P"), lit(s"P"))
initialDF.show
+---+---+
|  N|  P|
+---+---+
+---+---+

`

Upvotes: 0

Views: 831

Answers (1)

Arnon Rotem-Gal-Oz
Arnon Rotem-Gal-Oz

Reputation: 25909

It is working as intended the withColumn command only affects the schema and it allows setting a value to existing records (lit or some other calculation) but that would only be applied to existing rows. In your second case you created an empty dataframe. the withColum iterates on that and adds a "P" to any existing row (none..)

Upvotes: 3

Related Questions