In AWS Redshift, how can query cancellation be fully automated?

Question

In AWS Redshift I can terminate a query with the pg_terminate_backend(pid); command.

The problem is the pid can only be obtained after the query is ran by querying the stv_recents table.

This poses a problem in automation - How can I terminate a query (perhaps because it's exceeded a time limit) programmatically? As far as i can see, I can obtain the pid by filtering on the query, user and starttime columns (in the pid). The problem is, all queries are coming from the application user (same user) and some users may execute the same query.

Is there any way to hit some type of callback and obtain a pid at/after query execution? Otherwise, how can query cancellation be automated?

Thanks in advance.

Edgars T. · Accepted Answer

I have implemented myself similar approach and recommend it to you. I have defined groups in Redshift WLM. For example, if DB user belongs to one specific group, it can run queries without time limit, all other DB user queries will be cancelled after defined time. Drawback of this approach is that you will need to setup everything manually including max parallel queries per group. But this has to be done only once. This is config file created for our purposes:

[
  {
    "query_concurrency": 13,
    "memory_percent_to_use": 87,
    "query_group": [],
    "query_group_wild_card": 0,
    "user_group": [
      "wlm_main"
    ],
    "user_group_wild_card": 0
  },
  {
    "query_concurrency": 2,
    "max_execution_time": 300000,
    "query_group": [],
    "query_group_wild_card": 0,
    "user_group": [],
    "user_group_wild_card": 0
  },
  {
    "short_query_queue": true
  }
]

It says that users assigned to group wlm_main will be assigned to first queue. All others will be assigned to second queue where max of 2 queries can be run in parallel, all others are waiting in queue. Additional to that also short query acceleration is used. As you might see - first queue has no time limit, second queue has limit of 300000 miliseconds (5 minutes).

Edit: Expanding answer regarding this specific use case.

To get pid (process id) of query submitted to database. You could add comment in front of SQL statement with unique value (hash or sequence) like this: /* d131dd02c5e6eec4 */ select ... . And then based on this you could search for it in stv_recents or stv_inflight and get pid. After acquiring pid it can be used to terminate session: pg_terminate_backend(pid)

In AWS Redshift, how can query cancellation be fully automated?

Answers (1)

Related Questions