Reputation: 1079
I'm trying to create a empty dataframe and append new column. I tried to do this by two option. Option A is working but Option B is not working. Please help!
Option A:
`
var initialDF1 = Seq(("test")).toDF("M")
initialDF1 = initialDF1.withColumn(("P"), lit(s"P"))
initialDF1.show
+----+---+
| M| P|
+----+---+
|test| P|
+----+---+
`
Option B: (Not working)
`
import org.apache.spark.sql.types.{StructType, StructField, StringType}
import org.apache.spark.sql.Row
val schema = StructType(List(StructField("N", StringType, true)))
var initialDF = spark.createDataFrame(sc.emptyRDD[Row], schema)
initialDF = initialDF.withColumn(("P"), lit(s"P"))
initialDF.show
+---+---+
| N| P|
+---+---+
+---+---+
`
Upvotes: 0
Views: 831
Reputation: 25909
It is working as intended the withColumn command only affects the schema and it allows setting a value to existing records (lit or some other calculation) but that would only be applied to existing rows. In your second case you created an empty dataframe. the withColum iterates on that and adds a "P" to any existing row (none..)
Upvotes: 3