Reputation: 925
I'm trying out the Scala Spark creating dataframes. But this returns an empty DF Schema and empty DF.
Can someone tell me what is the issue here? Thanks!
val simpleData = Seq(Row("James","","Smith","36636"),
Row("Michael","Rose","","40288"),
Row("Robert","","Williams","42114"),
Row("Maria","Anne","Jones","39192"),
Row("Jen","Mary","Brown","")
)
val simpleSchema = StructType(Array(
StructField("firstname",StringType,true),
StructField("middlename",StringType,true),
StructField("lastname",StringType,true),
StructField("id", StringType, true)
))
val df = spark.createDataFrame(
sc.parallelize(simpleData),simpleSchema)
logger.info(s"df printschema: ${df.printSchema()}")
logger.info(s"df show: ${df.show}")
```
Upvotes: 0
Views: 1077
Reputation: 10035
df.printSchema()
will print the schema of the dataframe to output and not return a value that would be printed by your logger. You may access the schema as a StructType using df.schema
.
Try changing
logger.info(s"df printschema: ${df.printSchema()}")
to
logger.info(s"df printschema: ${df.schema.simpleString}")
or
logger.info(s"df printschema: ${df.schema.json}")
for the json representation.
Let me know if this works for you.
Upvotes: 2