Eric
Eric

Reputation: 1

Extracting key value pair from string

I am new to Hive and attempting to run a query where one of the columns (col1) is described as a type string and contains key value pairs such as {color=blue, name=john, size=M}. I am trying to extract some of the values so I could do something like return all rows where col1 contains color=blue.

I've been trying to use get_json_object but I don't think that was the right approach as I'm not sure the field is technically a json array.

Upvotes: 0

Views: 1665

Answers (2)

leftjoin
leftjoin

Reputation: 38325

You can convert your string to map (remove curly braces and spaces after comma and use str_to_map function). Example for Hive:

with your_data as 
(
select '{color=blue, name=john, size=M}' str
)

select str        as original_string,
       m['color'] as color,
       m['name']  as name,
       m['size']  as size
from
(
select str, str_to_map(regexp_replace(regexp_replace(str,'\\{|\\}',''),', *',','),',','=') m 
  from your_data --use your table
)s; 

Result:

original_string                 color   name    size    
{color=blue, name=john, size=M} blue    john    M   

Upvotes: 0

datapug
datapug

Reputation: 2441

Using SPARK SQL which is Hive compatible.

In case col1 is a string, this can be a solution:

val initDF = spark.sql("select '{color=blue, name=john, size=M}' as col1 union select '{color=red, name=jim, size=L}' as col1")
initDF.show(false)

It displays:

+-------------------------------+
|col1                           |
+-------------------------------+
|{color=blue, name=john, size=M}|
|{color=red, name=jim, size=L}  |
+-------------------------------+

And if you want to get only the rows where color=blue

initDF.where("col1 like '%color=blue%'").show(false)

Which shows the expected result:

+-------------------------------+
|col1                           |
+-------------------------------+
|{color=blue, name=john, size=M}|
+-------------------------------+

In case col1 is a struct:

val initDFStruct = spark.sql("select 'blue' as color, 'john' as name, 'M' as size union select 'red' as color, 'jim' as name, 'L'")
  .selectExpr("(color, name, size) as col1")
initDFStruct.show(false)

It displays:

+---------------+
|col1           |
+---------------+
|[red, jim, L]  |
|[blue, john, M]|
+---------------+

initDFStruct.where("col1.color = 'blue'").show(false)

Which shows the wanted result:

+---------------+
|col1           |
+---------------+
|[blue, john, M]|
+---------------+

In summary, if you have it as a string column, you can use in your where clause where col1 like '%color=blue%' while if you have it as a struct, you where clause should be: "col1.color = 'blue'

Upvotes: 1

Related Questions