supernova
supernova

Reputation: 2070

AWS Athena unnest with left join not working

I have a nested JSON structure where I want to unnest one JSON subtree. Either A or B is filled and event.type is marked as 'A' or 'B'. Here's two simplified examples:

{
  "event": {
        "event_type": "A",
        "time": 1599692445083,
        "A" : {
            "name": "item1",
            "revenue": 100
        }
      }
  }
}

{
  "event": {
        "event_type": "B",
        "time": 1599692445083,
        "B" : {
            "items" : [
                {"name": "item2", revenue" : 10},
                {"name": "item3", revenue" : 20},
            ]
        }
      }
  }
}

Query works so far, but I have the issue that UNNEST just works with cross join and thus I'm loosing all "A" events as I would need left join. Please note I do multiple aggregates, so calculating the sum alone on an easier way would be sufficient.

select
    from_unixtime( (floor(event.time/1000) / (60 * 60))  *60*60) as event_hour,
    count(*) filter(where event.event_type = 'A') as A_items,
    count(*) filter(where event.event_type = 'B') as B_items
FROM mydb.event_table
   left join unnest(event.B.items) as t(b)
WHERE
    year=2020 and month=9 and day=18 and hour=1
GROUP BY
    from_unixtime( (floor(event.time/1000) / (60 * 60))  *60*60),

For cross join unnest query executes, but as A does not have an item Array I do not get any entries counted. For left join unnest I got an error message (funny enough it states 'left' there as well):

mismatched input 'where' expecting {'join', 'cross', 'inner', 'left', 'right', 'full', 'natural', 'using', 'on', 'tablesample'}

Is there a way to use left join in unnest in Athena?

Upvotes: 1

Views: 5488

Answers (2)

Piotr Findeisen
Piotr Findeisen

Reputation: 20770

Athena is based on Presto .172 LEFT JOIN with UNNEST was added in Presto 319

Add support for INNER and OUTER joins involving UNNEST.

I would assume Athena does not support this until they

  • upgrade to newer Presto release
  • backport the feature to their fork

In the meantime, you can run latest greatest version of Presto:

Also, as @GMB pointed out, the JOIN requires the ON clause (can be as simple as ON true).

Upvotes: 4

supernova
supernova

Reputation: 2070

There is a work around for the mentioned use case to not loose entries when UNNESTING using cross join. It looks ugly but does the job until Athena will support newer Presto functionality.

Instead of

   left join unnest(event.B.items) as t(b)

one can rewrite the left join to a cross join with coalesce:

   cross join unnest(coalesce(event.B.items, array[null])) as t(b)

Upvotes: 2

Related Questions