martina.physics
martina.physics

Reputation: 9804

Redshift cluster: queries getting hang and filling up space

I have a Redshift cluster with 3 nodes. Every now and then, with users running queries against it, we end in this unpleasant situation where some queries run for way longer than expected (even simple ones, exceeding 15 minutes), and the cluster storage starts increasing to the point that if you don't terminate the long-standing queries it gets to 100% storage occupied.

I wonder why this may happen. My experience is varied, sometimes it's been a single query doing this and sometimes it's been different concurrent queries been run at the same time.

Upvotes: 0

Views: 1672

Answers (2)

Ying
Ying

Reputation: 106

One specific scenario where we saw this happen related to LISTAGG. The type of LISTAGG is varchar(65535), and while Redshift optimizes away the implicit trailing blanks when stored to disk, the full width is required in memory during processing.

If you have a query that returns a million rows, you end up with 1,000,000 rows times 65,535 bytes per LISTAGG, which is 65 gigabytes. That can quickly get you into a situation like what you describe, with queries taking unexpectedly long or failing with “Disk Full” errors.

My team discussed this a bit more on our team blog the other day.

Upvotes: 4

Joe Harris
Joe Harris

Reputation: 14035

This typically happens when a poorly constructed query spills a too much data to disk. For instance the user accidentally specifies a Cartesian product (every row from tblA joined to every row of tblB).

If this happens regularly you can implement a QMR rule that limits the amount of disk spill before a query is aborted.

Upvotes: 1

Related Questions