iskandarblue
iskandarblue

Reputation: 7526

Extracting messy JSON values in PrestoDB

I have an json column in a PrestoDB table mytable. The rows contain varying amounts of key-value pairs - sometimes keys are present, and sometimes keys are absent, and sometimes the JSON contained in the row is wrapped in quotes, and sometimes it is not:

SELECT mycolumm from mytable 
ORDER BY id DESC
LIMIT 3


{u'foo': 123,  u'bar': u'abc', u'baz': 456}
{u'bar': u'abc'}
"{u'baz': 456}"

My objective is to get only the values of rows that contain the key baz However, the following method does not work:

SELECT
JSON_EXTRACT_SCALAR(mycolumn, '$["baz"]')
from mytable
WHERE JSON_EXTRACT_SCALAR(mycolumn, '$["baz"]') IS NOT NULL
ORDER BY id
DESC

It should return

456
456

But the query returns nothing. What am I doing wrong?

Upvotes: 0

Views: 788

Answers (1)

jpavs
jpavs

Reputation: 658

I think that your syntax is incorrect. Try this:

   select
   JSON_EXTRACT_SCALAR(mycolumn, '$.baz')
   from mytable
   WHERE JSON_EXTRACT_SCALAR(mycolumn, '$.baz') IS NOT NULL
   ORDER BY ID DESC

You can see how json_path works here: https://github.com/json-path/JsonPath#path-examples. This is the notation that presto uses to navigate json strings.

Upvotes: 1

Related Questions