Vladislava Gonchar
Vladislava Gonchar

Reputation: 109

How to remove all \ from nested json in SQL Redshift?

I've got some problems with extracting values from nested json values in column. I've got a column of data with values that looks almost like nested json, but some of jsons got \ between values and I need to clean them.

JSON looks like this:

{"mopub_json":
   "{\"currency\":\"USD\",
   \"country\":\"US\",
   \"publisher_revenue\":0.01824}
"}

I need to get currency and publisher revenue as different columns and try this:

SET json_serialization_enable TO true;
SET json_serialization_parse_nested_strings TO true;
SELECT
JSON_EXTRACT_PATH_TEXT(column_name, 'mopub_json', 'publisher_revenue') as revenue_mopub,
JSON_EXTRACT_PATH_TEXT(column_name, 'mopub_json', 'currency') as currency_mopub
FROM(
SELECT replace(column_name, "\t",  '')
FROM table_name)

I receive the next error:

[Amazon](500310) Invalid operation: column "\t" does not exist in events

When I'm trying this:

SET json_serialization_parse_nested_strings TO true;
SELECT
JSON_EXTRACT_PATH_TEXT(column_name, 'mopub_json', 'publisher_revenue') as revenue_mopub,
JSON_EXTRACT_PATH_TEXT(column_name, 'mopub_json', 'currency') as currency_mopub
FROM(
SELECT replace(column_name, chr(92),  '')
FROM table_name)

I receive

Invalid operation: JSON parsing error

When I'm trying to extract values without replacing , I'm receiving empty columns.

Thank you for your help!

Upvotes: 0

Views: 399

Answers (1)

Bill Weiner
Bill Weiner

Reputation: 11057

So your json isn't valid. JSON doesn't allow multiline text strings but I expect that the issue. Based on your query I think you don't want a single key and string but the whole structure. The reason the that quotes are backslashed is because they are inside a string. The json should look like:

{
  "mopub_json": {
    "currency": "USD",
    "country": "US",
    "publisher_revenue": 0.01824
  }
}

Then the SQL you have should work.

Upvotes: 1

Related Questions