Reputation: 4010
I am trying to create a dataframe with One row whose values are null.
val df = Seq(null,null).toDF("a","b")
Faced issues even if we used null.instanceof also with no success.
val df = Seq(null.asInstanceOf[Integer],null.asInstanceOf[Integer]).toDF("a","b")
This works but I don't like to specify the type of field mostly it should be string.
Upvotes: 1
Views: 1289
Reputation: 27383
My preferred way is to use Option.empty[A]
:
val df = Seq((Option.empty[Int],Option.empty[Int])).toDF("a","b")
Upvotes: 2
Reputation: 492
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
import org.apache.spark.sql.{DataFrame, Row, SparkSession}
object SparkApp extends App {
val sparkSession: SparkSession = SparkSession.builder()
.appName("Spark_Test_App")
.master("local[2]")
.getOrCreate()
val schema: StructType = StructType(
Array(
StructField("a", IntegerType, nullable = true),
StructField("b", IntegerType, nullable = true)
)
)
import sparkSession.implicits._
val nullRDD: RDD[Row] = Seq((null, null)).toDF("a", "b").rdd
val df: DataFrame = sparkSession.createDataFrame(nullRDD, schema)
df.printSchema()
df.show()
sparkSession.stop()
}
Upvotes: 0
Reputation: 37852
I'm assuming you want a two-column DF, in that case each entry should be a tuple or a case-class. If that's the case, you can also explicitly state the type of the Seq
so that you don't have use asInstanceOf
:
val df = Seq[(Integer, Integer)]((null, null)).toDF("a","b")
Upvotes: 3
Reputation: 7207
Looks like missprint in "asInstanceOf", worked fine for me:
List(null.asInstanceOf[Integer],null.asInstanceOf[Integer]).toDF("a").show(false)
Upvotes: 0