Code Geek
Code Geek

Reputation: 925

Dataframe Schema prints out empty

I'm trying out the Scala Spark creating dataframes. But this returns an empty DF Schema and empty DF.

Can someone tell me what is the issue here? Thanks!

           val simpleData = Seq(Row("James","","Smith","36636"),
              Row("Michael","Rose","","40288"),
              Row("Robert","","Williams","42114"),
              Row("Maria","Anne","Jones","39192"),
              Row("Jen","Mary","Brown","")
            )

            val simpleSchema = StructType(Array(
              StructField("firstname",StringType,true),
              StructField("middlename",StringType,true),
              StructField("lastname",StringType,true),
              StructField("id", StringType, true)
            ))

            val df = spark.createDataFrame(
              sc.parallelize(simpleData),simpleSchema)
            logger.info(s"df printschema: ${df.printSchema()}")
            logger.info(s"df show: ${df.show}")
    ```

Upvotes: 0

Views: 1077

Answers (1)

ggordon
ggordon

Reputation: 10035

df.printSchema() will print the schema of the dataframe to output and not return a value that would be printed by your logger. You may access the schema as a StructType using df.schema.

Try changing

logger.info(s"df printschema: ${df.printSchema()}")

to

logger.info(s"df printschema: ${df.schema.simpleString}")

or

logger.info(s"df printschema: ${df.schema.json}")

for the json representation.

Let me know if this works for you.

Upvotes: 2

Related Questions