Sluna
Sluna

Reputation: 189

Extract specific key from array of jsons in Amazon Redshift

Background

I am working in Amazon Redshift database using SQL. I have a table and one of the column called attributes contains data that looks like this:

[{"name": "Color", "value": "Beige"},{"name":"Size", "value":Small"}]

or

[{"name": "Size", "value": "Small"},{"name": "Color", "value": "Blue"},{"name": "Material", "value": "Cotton"}]

From what I understand, the above is a series of path elements in a JSON string.

Issue

I am looking to extract the color value in each JSON string. I am unsure how to proceed. I know that if color was in the same location I could use the index to indicate where to extract from. But that is not the case here.

What I tried

select json_extract_array_element_text(attributes, 1) as color_value, json_extract_path_text(color_value, 'value') as color from my_table

This query works for some columns but not all as the location of the color value is different.

I would appreciate any help here as i am very new to sql and have only done basic querying. I have been using the following page as a reference

Upvotes: 1

Views: 1470

Answers (1)

Bill Weiner
Bill Weiner

Reputation: 11092

First off your data is in an array format (between [ ]), not object format (between { }). The page you mention is a function for extracting data from JSON objects, not arrays. Also array format presents challenges as you need to know the numeric position of the element you wish to extract.

Based on your example data it seems like objects is the way to go. If so you will want to reformat your data to be more like:

{"Color": "Beige", "Size": "Small"}

and

{"Size": "Small", "Color": "Blue", "Material": "Cotton"}

This conversion only works if the "name" values are unique in your data.

With this the function you selected - JSON_EXTRACT_PATH_TEXT() - will pull the values you want from the data.

Now changing data may not be an option and dealing with these arrays will make things harder and less performant. To do this you will need to expand these arrays by cross joining with a set of numbers that contain all numbers up to the maximum length of your arrays. For example for the samples you gave you will need to cross join by the values 0,1,2 so that you 3 element array can be fully extracted. You can then filter on only those rows that have a "name" of "color".

The function you will need for extracting elements from an array is JSON_EXTRACT_ARRAY_ELEMENT_TEXT() and since you have objects stored in the array you will need to run JSON_EXTRACT_PATH_TEXT() on the results.

Upvotes: 0

Related Questions