Reputation: 183
I have a data frame with below schema for the column col.
col:array
element:struct
Id:string
Seq:int
Pct:double
Amt:long
When the data is not available below is the structure that comes
col:array
element:string
The column can contain data and can be empty.
When the data is available it is in below format from source:
{"Id": "123456-1", "Seq": 1, "Pct": 0.1234, "Amt": 3000}
When the data is not available I am putting a default as below:
.withColumn("col", when (size($"col") === 0, array(lit("A").cast("string"), lit(0).cast("int"), lit(0.0).cast("double"))).otherwise($"col")
For the empty data I am getting the data seems to be casted to string:
["A", "0", "0.0", "0.0"]
How can I get the below output:
{"Id": "A", "Seq": 0, "Pct": 0.0}
When data is available in source below is the output:
+----------------------------------------------------+
| Data |
+----------------------------------------------------+
|[[236711-1, 0.14, 1.5, 1], [236711-1, 0.14, 2.0, 2]]|
|[[1061605-1, 0.011, 1.0, 1]] |
+----------------------------------------------------+
When data is not avaialble
| Data |
+------+
|[] |
+------+
Upvotes: 0
Views: 87
Reputation: 42352
You can create an array of one struct instead of an array:
val df2 = df.withColumn(
"col",
df.schema("col").dataType match {
case ArrayType(StringType, _) =>
array(
struct(
lit("A").cast("string").as("Id"),
lit(0).cast("int").as("Seq"),
lit(0.0).cast("double").as("Pct")
)
)
case ArrayType(StructType(_), _) => $"col"
}
)
Upvotes: 1