Reputation: 1000
So, I have data in a column which looks like this:
select additional_data,typeof(additional_data) as type
from table
I am interested in querying the field appName
. However, I can't seem to find a way to do this.
I have tried:
select additional_data.content.appName
from table
but that gives me an error saying
can't extract value from additional_data#10527.content: need struct type but got string
since content is string, I then tried casting it to struct , using something like
select additional_data,
cast(additional_data.content as struct)
from table
but that gave me
DataType struct is not supported
I then tried to check if the content could be changed to struct, but I couldn't with
select additional_data,
named_struct('content', additional_data.content),
typeof(named_struct('content', additional_data.content))
from table
The type remained the same as in the original case with the content still being a string
I also looked at this link, but even that I could not get to work
If I did this
select additional_data,
additional_data:content
from table
it errored with
Cannot resolve 'semi_structured_extract_json_multi(table.additional_data, '$.content')' due to data type mismatch: argument 1 requires string type, however, 'table.additional_data' is of struct<content:string> type.
Can someone help with how I can do this correctly and of course efficiently?
Let's assume that the original table can't be altered and the end users can only use pure sql (no spark dataframes)
I can of course answer questions.
Upvotes: 1
Views: 5609
Reputation: 1000
So, two options:
select from_json(additional_data.content,'searchPhrase string, isResultFound string, appName string').appName
from table
select get_json_object(additional_data.content, '$.appName')
from table
Upvotes: 1
Reputation: 1611
The idea is to convert your first line to a structured value, extract the content from content
, then again parse your string to another structured value (through from_json
), then extract the values from the key-value pair.
This should do the trick:
val df = spark.sql(
"""select content,
|from_json(content, 'struct<searchPhrase:string>').searchPhrase as searchPhrase,
|from_json(content, 'struct<isResultFound:string>').isResultFound as isResultFound,
|from_json(content, 'struct<appName:string>').appName as appName
|from (select from_json(additional_data, 'struct<content:string>').content as content from table)
|"""
.stripMargin)
or in Dataset API:
ds = ds
.withColumn("content", expr("from_json(additional_data, 'STRUCT<content:STRING>').content"))
.withColumn("other_json", expr("from_json(content, 'struct<searchPhrase:string,isResultFound:string,appName:string>')"))
.withColumn("searchPhrase", expr("other_json.searchPhrase"))
.withColumn("isResultFound", expr("other_json.isResultFound"))
.withColumn("appName", expr("other_json.appName"))
.drop("other_json", "additional_data")
Final result:
+---------------------------------------------------------------------+------------+-------------+-------+
|content |searchPhrase|isResultFound|appName|
+---------------------------------------------------------------------+------------+-------------+-------+
|{"searchPhrase": "shor", "isResultFound": "true", "appName": "phone"}|shor |true |phone |
+---------------------------------------------------------------------+------------+-------------+-------+
Good luck!
Upvotes: 1