Reputation: 447
I want to convert a list of objects and store their attributes as columns.
{
"heading": 1,
"columns": [
{
"col1": "a",
"col2": "b",
"col3": "c"
},
{
"col1": "d",
"col2": "e",
"col3": "f"
}
]
}
Final Result
heading | col1 | col2 | col3
1 | a | b | c
1 | d | e | f
I am currently flattening my data (and excluding the columns column)
df = target_table.relationalize('roottable', temp_path)
However, for this use case, I will need the columns column. I saw examples where arrays_zip
and explode
was used. Would I need to iterate through each object or is there an easier way to extract each object and convert into a row?
Upvotes: 0
Views: 658
Reputation: 13998
use Spark SQL builtin function: inline or inline_outer is probably easiest way to handle this (use inline_outer when NULL is allowed in columns
):
From Apache Hive document:
Explodes an array of structs to multiple rows. Returns a row-set with N columns (N = number of top level elements in the struct), one row per struct from the array. (As of Hive 0.10.)
df.selectExpr('heading', 'inline_outer(columns)').show()
+-------+----+----+----+
|heading|col1|col2|col3|
+-------+----+----+----+
| 1| a| b| c|
| 1| d| e| f|
+-------+----+----+----+
Upvotes: 2