Reputation: 1
I have very large (1 billion + records) files in S3, that I am querying via Amazon Redshift using Spectrum.
I have a datatype in Redshift as follows:
map<string,struct<string_value:string,int_value:bigint,float_value:double,double_value:double>>
When I query the data like this:
SELECT
ep.value.int_value
FROM table a, a.events ep
WHERE ep.key = 'event_id'
It works, and I get a result back.
However, what I want is to be able to select the value, without needing to add anything in the where clause.
The query below, works for Athena. Is there a similar thing for Redshift?
SELECT
event_params['event_id'].int_value AS event_id
FROM table a
Thanks in advance!
I tried using dot notation in the select, but I get an error.
Update:
SELECT
(SELECT max(ep.value.int_value) from a.events as ep where b.key = 'event_id')
FROM table a
This works, but is this cost efficient? Given the millions of rows I need to do analysis on...
Upvotes: 0
Views: 126
Reputation: 11032
Your first query is “unnesting” the array “events” in the FROM clause - “FROM table a, a.events ep“. See https://docs.aws.amazon.com/redshift/latest/dg/query-super.html. Basically this is mapping all the values of the events array into rows.
It is not clear why you cannot just remove the WHERE clause. You will likely get some NULL values where events isn’t defined but cleaning that up is straight forward.
Upvotes: 0