Sluna
Sluna

Reputation: 189

Regex in Amazon Redshift using SQL

I'm using sql in amazon redshift. I have a column called attributes which looks like this:

[{"name": "Size", "value": "Small"},{"name": "Color", "value": "Blue"},{"name": "Material", "value": "Cotton"}]

or

[{"name": "Material", "value": "silk"},{"name": "Color", "value": "Pink"}]

In order to extract the material value, I'm trying to use regex. I have the initial code such:

select REGEXP_SUBSTR(attributes,'Material.*') as string_value

Result:

Material", "value": "Cotton"}] 

and

Material", "value": "silk"},{"name": "Color", "value": "Pink"}]

How can I take this a step further and get just the value of material, so output should look like:

Cotton

silk

I'm new to sql and new to regex so would really appreciate your help.

Upvotes: 1

Views: 187

Answers (2)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520898

You should probably not be resorting to using regex to parse JSON content, especially given that your JSON content is nested inside an array. I am admittedly not an expert in the Redshift JSON API, but this SO question was very helpful. Consider the following approach:

CREATE VIEW seq_0_to_3 AS
    SELECT 0 AS i UNION ALL                                      
    SELECT 1 UNION ALL
    SELECT 2          
);

WITH exploded_array AS (                                                                          
    SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(json_col, seq.i) AS json
    FROM yourTable, seq_0_to_3 AS seq
    WHERE seq.i < JSON_ARRAY_LENGTH(json_col)
)

SELECT JSON_EXTRACT_PATH_TEXT(json, 'value') AS Material
FROM exploded_array
WHERE JSON_EXTRACT_PATH_TEXT(json, 'name') = 'Material';

This answer assumes that your JSON arrays would always have 3 elements for the size, color, and material. If the count would differ from 3, then update the view seq_0_to_3 above with the correct number of elements.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269473

You can use regular expressions and capture groups:

select REGEXP_SUBSTR(attributes, '"Material", "value": "([^"]*)"') as string_value

Upvotes: 1

Related Questions