justanothertekguy
justanothertekguy

Reputation: 183

Datatype modification of the elements in array type column

I have a data frame with below schema for the column col.

col:array
    element:struct
      Id:string
      Seq:int
      Pct:double
      Amt:long

When the data is not available below is the structure that comes

col:array
   element:string

The column can contain data and can be empty.

When the data is available it is in below format from source:

{"Id": "123456-1", "Seq": 1, "Pct": 0.1234, "Amt": 3000}

When the data is not available I am putting a default as below:

.withColumn("col", when (size($"col") === 0, array(lit("A").cast("string"), lit(0).cast("int"), lit(0.0).cast("double"))).otherwise($"col")

For the empty data I am getting the data seems to be casted to string:

["A", "0", "0.0", "0.0"]

How can I get the below output:

{"Id": "A", "Seq": 0, "Pct": 0.0}

When data is available in source below is the output:

+----------------------------------------------------+
|   Data                                             |
+----------------------------------------------------+
|[[236711-1, 0.14, 1.5, 1], [236711-1, 0.14, 2.0, 2]]|
|[[1061605-1, 0.011, 1.0, 1]]                        |
+----------------------------------------------------+

When data is not avaialble

| Data |
+------+
|[]    |
+------+

Upvotes: 0

Views: 87

Answers (1)

mck
mck

Reputation: 42352

You can create an array of one struct instead of an array:

val df2 = df.withColumn(
    "col",     
    df.schema("col").dataType match {
        case ArrayType(StringType, _) =>
            array(
                struct(
                    lit("A").cast("string").as("Id"), 
                    lit(0).cast("int").as("Seq"), 
                    lit(0.0).cast("double").as("Pct")
                )
            )
        case ArrayType(StructType(_), _) => $"col"
    }
)

Upvotes: 1

Related Questions