Nick Edwards
Nick Edwards

Reputation: 1

I need to access values from a MAP datatype in Redshift Spectrum

I have very large (1 billion + records) files in S3, that I am querying via Amazon Redshift using Spectrum.

I have a datatype in Redshift as follows:

map<string,struct<string_value:string,int_value:bigint,float_value:double,double_value:double>>

When I query the data like this:

SELECT 
ep.value.int_value
FROM table a, a.events ep
WHERE ep.key = 'event_id' 

It works, and I get a result back.

However, what I want is to be able to select the value, without needing to add anything in the where clause.

The query below, works for Athena. Is there a similar thing for Redshift?

SELECT
event_params['event_id'].int_value AS event_id
FROM table a

Thanks in advance!

I tried using dot notation in the select, but I get an error.

Update:

SELECT
(SELECT max(ep.value.int_value) from a.events as ep where b.key = 'event_id')
FROM table a

This works, but is this cost efficient? Given the millions of rows I need to do analysis on...

Upvotes: 0

Views: 126

Answers (1)

Bill Weiner
Bill Weiner

Reputation: 11032

Your first query is “unnesting” the array “events” in the FROM clause - “FROM table a, a.events ep“. See https://docs.aws.amazon.com/redshift/latest/dg/query-super.html. Basically this is mapping all the values of the events array into rows.

It is not clear why you cannot just remove the WHERE clause. You will likely get some NULL values where events isn’t defined but cleaning that up is straight forward.

Upvotes: 0

Related Questions