Reputation: 23
I have a JSON file like below :
{"Codes":[{"CName":"012","CValue":"XYZ1234","CLevel":"0","msg":"","CType":"event"},{"CName":"013","CValue":"ABC1234","CLevel":"1","msg":"","CType":"event"}}
I wanted to create the schema for this and if the JSON file is empty({}
) it should be an empty String.
However, df Output is below when I used df.show
:
[[012, XYZ1234, 0, event, ], [013, ABC1234, 1, event, ]]
I created Schema like below :
val schemaF = ArrayType(
StructType(
Array(
StructField("CName", StringType),
StructField("CValue", StringType),
StructField("CLevel", StringType),
StructField("msg", StringType),
StructField("CType", StringType)
)
)
)
When I tried below,
val df1 = df.withColumn("Codes",from_json('Codes, schemaF))
It gives AnalysisException :
org.apache.spark.sql.AnalysisException: cannot resolve 'jsontostructs(
Codes
)' due to data type mismatch: argument 1 requires string type, however, 'Codes
' is of array<structCName:string,CValue:string,CLevel:string,CType:string,msg:string> type.;; 'Project [valid#51, jsontostructs(ArrayType(StructType(StructField(CName,StringType,true), StructField(CValue,StringType,true), StructField(CLevel,StringType,true), StructField(msg,StringType,true), StructField(CType,StringType,true)),true), Codes#8, Some(America/Bogota)) AS errorCodes#77]
Can someone please tell me why and how to resolve this issue?
Upvotes: 1
Views: 3206
Reputation: 32700
Your schema does not correspond to the JSON file you're trying to read. It's missing the field Codes
of array type, it should look like this :
val schema = StructType(
Array(
StructField(
"Codes",
ArrayType(
StructType(
Array(
StructField("CLevel", StringType, true),
StructField("CName", StringType, true),
StructField("CType", StringType, true),
StructField("CValue", StringType, true),
StructField("msg", StringType, true)
)
), true)
,true)
)
)
And you want to apply it when reading the json not with from_json
function :
val df = spark.read.schema(schema).json("path/to/json/file")
df.printSchema
//root
// |-- Codes: array (nullable = true)
// | |-- element: struct (containsNull = true)
// | | |-- CLevel: string (nullable = true)
// | | |-- CName: string (nullable = true)
// | | |-- CType: string (nullable = true)
// | | |-- CValue: string (nullable = true)
// | | |-- msg: string (nullable = true)
EDIT:
For your comment question, you can use this schema definition:
val schema = StructType(
Array(
StructField(
"Codes",
ArrayType(
StructType(
Array(
StructField("CLevel", StringType, true),
StructField("CName", StringType, true),
StructField("CType", StringType, true),
StructField("CValue", StringType, true),
StructField("msg", StringType, true)
)
), true)
,true),
StructField("lid", StructType(Array(StructField("idNo", StringType, true))), true)
)
)
Upvotes: 0
Reputation: 640
val schema =
StructType(
Array(
StructField("CName", StringType),
StructField("CValue", StringType),
StructField("CLevel", StringType),
StructField("msg", StringType),
StructField("CType", StringType)
)
)
val df0 = spark.read.schema(schema).json("/path/to/data.json")
Upvotes: 0