Reputation: 5134
I am trying to convert one dataset which declares a column to have a certain struct
type (eg. struct<x: string, y: string>
) to a map<string, string>
type. And I would like to do it in SQL, possibly without using UDFs.
UPDATE:
My requirement is also that the transformation is done generically without any prior knowledge of the struct keys (in my problem I am getting data in a complex JSON, and I don't want to keep that complexity in the schema).
Example input data:
WITH input (struct_col) as (
select named_struct('x', 'valX', 'y', 'valY') union all
select named_struct('x', 'valX1', 'y', 'valY2')
)
select *
from input
expected output is a column of type map<string, string>
struct_col:map<string, string> |
---|
{"x":"valX","y":"valY"} |
{"x":"valX1","y":"valY2"} |
UPDATE:
So far I managed to find this very convoluted solution which works only with Spark >= 3.1.0
(because of json_object_keys
function). Would be so nice to just be able to cast a struct to a map
WITH input (struct_col) as (
select named_struct('x', 'valX', 'y', 'valY') union all
select named_struct('x', 'valX1', 'y', 'valY2')
)
select transform_values(
map_from_arrays(
json_object_keys(to_json(struct_col)),
json_object_keys(to_json(struct_col))
),
(k, v) -> get_json_object(to_json(struct_col), '$.' || k))
from input
Upvotes: 6
Views: 10233
Reputation: 581
What about
create_map('struct_col.x', 'struct_col.valX', 'struct_col.y', 'struct_col.valY')
Upvotes: 0
Reputation: 5134
I have found a way to do it which requires one roundtrip of serializing and parsing a json using the to_json
and from_json
functions. The trick is that the from_json
also takes a schema argument where I use the map<string, string>
type.
Also, this solution should work with spark < 3.x
WITH input (struct_col) as (
select named_struct('x', 'valX', 'y', 'valY')
union all
select named_struct('x', 'valX1', 'y', 'valY2')
)
select from_json(to_json(struct_col), 'map<string, string>') as map_col
from input;
Upvotes: 10