Simon123
Simon123

Reputation: 11

Greenplum error "Canceling query because of high VMEM usage."

We have a small Greenplum cluster in which some queries abort.

System related information:

Greenplum Version: 6.3
Master Host: 1
Segment Host: 2
RAM per Segmenthost: 32GB
SWAP per Segmenthost: 32GB
TOTAL segment: 8 Primary + 0 mirror
segment per host: 4

vm_overcommmit_ratio: 95
gp_vmem_protect_limit: 8072MB
statement_mem: 250MB

The queries are executed with a none superuser.

Symptom:

The query failed with the following error massage:

Canceling query because of high VMEM usage. Used: 7245MB, available 801MB, red zone: 7264MB (runaway_cleaner.c:189) 

What we tried:

  1. We calculate the Greenplum Parameter with this information: https://gpdb.docs.pivotal.io/6-3/best_practices/sysconfig.html

    This help us for some "simple" queries but for more complicated ones the error happend again.

  2. In the next Step we configured the max_statement_mem: 2000MB

    This didn't have any effect to the memory consumption on the segmenthosts. We track this with following Query:

select segid, sum (vmem_mb) from session_state.session_level_memory_consumption
where query like '%<some snippet of the query>%'
group by segid
order by segid;

The memory consumption increases very quickly and the error happend again.

  1. We tried to restrict the memory consumption by setting the following resource queue for the user:
CREATE RESOURCE QUEUE adhoc with (ACTIVE_STATEMENTS=6, MEMORY_LIMIT=6291);
ALTER ROLE user1 RESOURCE QUEUE adhoc;

The Database is set to use the resource queue with the parameter gp_resource_manager: queue

We see in the Table 'gp_toolkit.gp_resqueue_status' when we execute a statement that the 'rsqmemoryvalue' is 1048 but the memory consumption in the session_state.session_level_memory_consumption table shows higher values for the segments until the error occurs again.

Has anyone a tip to fix this problem?

Upvotes: 1

Views: 6353

Answers (1)

Sung Yu-wei
Sung Yu-wei

Reputation: 161

Each query will ask for 250MB memory and you set gp_vmem_protect_limit to 8GB. In this case, you can probably run (8GB- primary process memory)/250MB =~ 20-30 queries at the same time. The size of primary process depends on other settings, shared_buffers, wal_buffers,...

Statement_mem can be set in a session. This means some users can set statement_mem higher (up to max_statement_mem) and you will see less queries in concurrent.
When the memory allocated to those concurrent queries reach 90(OR 95) % of gp_vmem_protect_limit, runaway detector will start to cancel queries to protect primary process from OS OOM Kill.

To "fix" the problem (it is not a problem actually), you can

1) set lower default statement_mem, so you can have more queries running concurrently but slower.

2) increase RAM on segment hosts, such that you can increase gp_vmem_protect_limit.

Upvotes: 0

Related Questions