Paul Reiners
Paul Reiners

Reputation: 7886

Creating a struct field in a row of a dataframe

I have the code below where I'm trying to create a Spark DataFrame with a field that is a struct. What should I replace ??? with to get this to work.

import org.apache.spark.sql.types._
import org.apache.spark.sql.{DataFrame, Row, SparkSession}

val spark: SparkSession = SparkSession.builder()
  .appName("NodesLanesTest")
  .getOrCreate()
val someData = Seq(
  Row(1538161836000L, 1538075436000L, "cargo3", 3L, ???("Chicago", "1234"))
)
val someSchema = StructType(
  List(
    StructField("ata", LongType, nullable = false),
    StructField("atd", LongType, nullable = false),
    StructField("cargo", StringType, nullable = false),
    StructField("createdDate", LongType, nullable = false),
    StructField("destination",
      StructType(List(
        StructField("name", StringType, nullable = false),
        StructField("uuid", StringType, nullable = false)
      ))))
val someDF = spark.createDataFrame(
  spark.sparkContext.parallelize(someData),
  StructType(someSchema)
)

Upvotes: 2

Views: 2114

Answers (1)

Álvaro Valencia
Álvaro Valencia

Reputation: 1217

You're missing a Row object. When you create a dataframe from a Sequence of Row objects, the StructType are expected to be represented as Row objects, so it must work for you:

val someData = Seq(
  Row(1538161836000L, 1538075436000L, "cargo3", 3L, Row("Chicago", "1234"))
)

Hope it helps.

Upvotes: 4

Related Questions