Nishant Dixit
Nishant Dixit

Reputation: 5522

Remove duplicate record using Unnest | Aws Athena

I am facing issue while filtering data with array I have columns userid,event_name,attributes,ti

Attributes column have value like this

{"bool_sample":true,"array_int":[10,20,25,38],"array_string":["hello","world","i am fine"]}

My query

SELECT * FROM "events-data" CROSS JOIN UNNEST(CAST(json_extract(attributes, '$["array_int"]') AS array<int>)) AS t(val)
WHERE val > 9; 

This query filtering data but it giving me multiple row for same record, for above attributes column record it giving me 4 rows

userid,event_name,attributes,ti,val
test-userid,test-event,test-attributes,10
test-userid,test-event,test-attributes,20
test-userid,test-event,test-attributes,25
test-userid,test-event,test-attributes,38

I do not need multiple rows

Upvotes: 0

Views: 54

Answers (2)

Guru Stron
Guru Stron

Reputation: 142833

That is what UNNEST does - it expands array in multiple rows (one row per array element, see the docs).

If the goal is to fetch rows with array_int containing values more than 9 then you can use JSON functions, for example json_exists:

-- sample data
WITH dataset(userid,event_name,attributes) as (
    values ('test-userid', 'test-event', '{"bool_sample":true,"array_int":[10,20,25,38],"array_string":["hello","world","i am fine"]}')
)

-- query
select *
from dataset
where json_exists(attributes, 'lax $.array_int[*]?(@ > 9)');

Output:

userid event_name attributes
test-userid test-event {"bool_sample":true,"array_int":[10,20,25,38],"array_string":["hello","world","i am fine"]}

If you need to get the array itself with filtered data then you can use the casting+filtering:

-- query
select userid, event_name, vals
from (
    select *, filter(cast(json_extract(attributes, '$.array_int') as array(int)), v -> v > 9) vals
    from dataset)
where cardinality(vals) > 0;

Output:

userid event_name vals
test-userid test-event {10,20,25,38}

Upvotes: 0

ValNik
ValNik

Reputation: 5731

No need CROSS JOIN with UNNEST extracted data. Check exists for.

Try

SELECT * FROM "events-data" 
WHERE exists (
    select val 
    from UNNEST(CAST(json_extract_path(attributes, '$.["array_int"]') AS int[])) AS t(val) 
    where val>9)
; 

Upvotes: 0

Related Questions