Reputation: 1459
I see that several datasets have an array of Structs inside of an element instead of an Array of String or Integer.
|-- name: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- value: string (nullable = true)
I was wondering why because ultimately what I want is to be able to represent an Array of Strings then why have a struct in between.
Upvotes: 0
Views: 68
Reputation: 2178
You can hold array of Strings using ArrayType
and StructField
. You don't need to use StructType
inside StructField
. In the example, column2 can hold array of String. Please see schema for "column2". Nevertheless the schema for the whole row will be a StructType
.
StructType(
Array(
StructField("column1", LongType, nullable = true),
StructField("column2", ArrayType(StringType, true), nullable = true)
)
)
You need a StructType to hold a complex type which consists of many data types. It is like holding a table within a column. Please see schema for "column2".
StructType(
Array(
StructField("column1", LongType, nullable = true),
StructField("column2", ArrayType(StructType(Array(
StructField("column3", StringType, nullable = true),
StructField("column4", StringType, nullable = true))),
true)
)
)
Upvotes: 1