Reputation: 509
When I trying to read a spark dataframe column containing JSON string as array, with a defined schema it returns null. I tried Array, Seq and List for the schema but all returns null. My spark version is 2.2.0
val dfdata= spark.sql("""select "\[{ \"id\":\"93993\", \"name\":\"Phil\" }, { \"id\":\"838\", \"name\":\"Don\" }]" as theJson""")
dfdata.show(5,false)
val sch = StructType(
Array(StructField("id", StringType, true),
StructField("name", StringType, true)))
print(sch.prettyJson )
dfdata.select(from_json($"theJson", sch)).show
and the output
+---------------------------------------------------------------+
|theJson |
+---------------------------------------------------------------+
|[{ "id":"93993", "name":"Phil" }, { "id":"838", "name":"Don" }]|
+---------------------------------------------------------------+
{
"type" : "struct",
"fields" : [ {
"name" : "id",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "name",
"type" : "string",
"nullable" : true,
"metadata" : { }
} ]
}+----------------------+
|jsontostructs(theJson)|
+----------------------+
| null|
+----------------------+
Upvotes: 2
Views: 4763
Reputation: 66
Have you tried parsing your json string before obtaining a DF?
// obtaining this string should be easy:
val jsonStr = """[{ "id":"93993", "name":"Phil" }, { "id":"838", "name":"Don" }]"""
// then you can take advantage of schema inference
val df2 = spark.read.json(Seq(jsonStr).toDS)
df2.show(false)
// it shows:
// +-----+----+
// |id |name|
// +-----+----+
// |93993|Phil|
// |838 |Don |
// +-----+----+
Upvotes: 0
Reputation: 2495
Your schema isn't quite right for your example. Your example is an array of structs. Try by wrapping it in an ArrayType
:
val sch = ArrayType(StructType(Array(
StructField("id", StringType, true),
StructField("name", StringType, true)
)))
Upvotes: 1