Rahi c
Rahi c

Reputation: 1

Segment compilation is happening for most of our REDSHIFT queries which is adding to our query performance

We are running range restricted query in our redshift where only the employee_id and business_date keep changing .But we keep seeing in SVL_COMPILE our hourly segment compilation is very high .Most of the queries are getting compiled which is adding additional time to the query.

This query keep getting executed with different employee_id and business_date range by our product team .

  1. Hourly hit : 1500 queries per hour.

  2. Date Format for business_date field: : yyyy-MM-dd

  3. tillTime is a timestamp field with constant value as 9999-01-01 01:00:00 . We had keep tillTime in query as we set tillTime to current date while milestoning the entry.Only active records will have the value as 9999-01-01 01:00:00.

  4. Queries are getting executing from an spring boot app Deployed in EC2 having instance count as 10.

  5. employee_id : Can go from 1 to 5 employee_id max.

  6. business_date range from 1 day to 30 days

  7. columns: We have 216 columns in table. So the db column in select queries are dynamic based on user selection.We only fetch those columns that user has requested.

Query:

Select coulmns from acconts table where   business_date between (?) and (?) and employee_id in (?) and  sysdate >tillTime.

Our table configuration is:

Sortkey: (SORTKEY(business_date,employee_id ,snap_type,tillTime))
Diststyle: AUTO(EVEN)

Our cluster configuration is:

Node type: ra3.xlplue
Number of nodes: 10

We tried below:

Query:

Select coulmns  from acconts table where business_date between (?) and (?) and  emmployee_id in (?) and 
tillTime='9999-01-01 01:00:00'

Still not much improvement observed . Any body can guide here on why segment compiled when sysdate >tillTime or when tillTime='9999-01-01 01:00:00'

Upvotes: 0

Views: 68

Answers (1)

Bill Weiner
Bill Weiner

Reputation: 11082

Lots of possibilities here so going to throw out some ideas to explore. More info needed to narrow down what to focus on.

My first guess is that the compiler algorithm is changing between runs due to options being sent to Redshift. The IN clause can be compiled differently depending on how many values are in the IN clause. A few values and this will compile as a list of OR conditions. Many values and the compiler will use a pseudo-join structure. The plans are different and deciding to change would cause a recompile.

This type of significant change in is what you are likely looking for. My guess is just that a guess. Take a look at the explain plans before and after a recompile and see what changes between them. Clues are likely needed.

Your table definition and state of metadata validity are also important. This info is used to decide which compiler algorithm to use so if it gets meaningfully out of date decisions can change. Is your table changing every hour? How often are you updating and analyzing?

Upvotes: 0

Related Questions