Reputation: 4010

DataFrame Initialization with null values

I am trying to create a dataframe with One row whose values are null.

val df = Seq(null,null).toDF("a","b")

Faced issues even if we used null.instanceof also with no success.

val df = Seq(null.asInstanceOf[Integer],null.asInstanceOf[Integer]).toDF("a","b")

This works but I don't like to specify the type of field mostly it should be string.

Upvotes: 1

Answers (4)

Raphael Roth

Reputation: 27383

My preferred way is to use Option.empty[A]:

val df = Seq((Option.empty[Int],Option.empty[Int])).toDF("a","b")

Upvotes: 2

Mansoor Baba Shaik

Reputation: 492

import org.apache.spark.rdd.RDD
import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
import org.apache.spark.sql.{DataFrame, Row, SparkSession}

object SparkApp extends App {

  val sparkSession: SparkSession = SparkSession.builder()
    .appName("Spark_Test_App")
    .master("local[2]")
    .getOrCreate()

  val schema: StructType = StructType(
    Array(
      StructField("a", IntegerType, nullable = true),
      StructField("b", IntegerType, nullable = true)
    )
  )

  import sparkSession.implicits._
  val nullRDD: RDD[Row] = Seq((null, null)).toDF("a", "b").rdd

  val df: DataFrame = sparkSession.createDataFrame(nullRDD, schema)

  df.printSchema()

  df.show()

  sparkSession.stop()
}

Upvotes: 0

Tzach Zohar

Reputation: 37852

I'm assuming you want a two-column DF, in that case each entry should be a tuple or a case-class. If that's the case, you can also explicitly state the type of the Seq so that you don't have use asInstanceOf:

val df = Seq[(Integer, Integer)]((null, null)).toDF("a","b")

Upvotes: 3

pasha701

Reputation: 7207

Looks like missprint in "asInstanceOf", worked fine for me:

       List(null.asInstanceOf[Integer],null.asInstanceOf[Integer]).toDF("a").show(false)

Upvotes: 0

DataFrame Initialization with null values

Answers (4)

Related Questions