Reputation: 1
I am new to Hive and attempting to run a query where one of the columns (col1)
is described as a type string and contains key value pairs such as {color=blue, name=john, size=M}
. I am trying to extract some of the values so I could do something like return all rows where col1 contains color=blue.
I've been trying to use get_json_object
but I don't think that was the right approach as I'm not sure the field is technically a json array.
Upvotes: 0
Views: 1665
Reputation: 38325
You can convert your string to map (remove curly braces and spaces after comma and use str_to_map function). Example for Hive:
with your_data as
(
select '{color=blue, name=john, size=M}' str
)
select str as original_string,
m['color'] as color,
m['name'] as name,
m['size'] as size
from
(
select str, str_to_map(regexp_replace(regexp_replace(str,'\\{|\\}',''),', *',','),',','=') m
from your_data --use your table
)s;
Result:
original_string color name size
{color=blue, name=john, size=M} blue john M
Upvotes: 0
Reputation: 2441
Using SPARK SQL which is Hive compatible.
In case col1 is a string, this can be a solution:
val initDF = spark.sql("select '{color=blue, name=john, size=M}' as col1 union select '{color=red, name=jim, size=L}' as col1")
initDF.show(false)
It displays:
+-------------------------------+
|col1 |
+-------------------------------+
|{color=blue, name=john, size=M}|
|{color=red, name=jim, size=L} |
+-------------------------------+
And if you want to get only the rows where color=blue
initDF.where("col1 like '%color=blue%'").show(false)
Which shows the expected result:
+-------------------------------+
|col1 |
+-------------------------------+
|{color=blue, name=john, size=M}|
+-------------------------------+
In case col1 is a struct:
val initDFStruct = spark.sql("select 'blue' as color, 'john' as name, 'M' as size union select 'red' as color, 'jim' as name, 'L'")
.selectExpr("(color, name, size) as col1")
initDFStruct.show(false)
It displays:
+---------------+
|col1 |
+---------------+
|[red, jim, L] |
|[blue, john, M]|
+---------------+
initDFStruct.where("col1.color = 'blue'").show(false)
Which shows the wanted result:
+---------------+
|col1 |
+---------------+
|[blue, john, M]|
+---------------+
In summary, if you have it as a string column, you can use in your where clause
where col1 like '%color=blue%'
while if you have it as a struct, you where clause should be:
"col1.color = 'blue'
Upvotes: 1