How to reduce and monitor Compute resources in Snowflake?

Question

I have several connections to Snowflake issuing SQL commands including adhoc queries I run for debugging/development manually, tasks I run twice a day to make summary tables, and Chartio (a dashboarding application) running interval queries against mostly my summary tables.

I’m using a lot more credits lately primarily due to computational resources. I could segment the different connections to different warehouses in order to isolate which of these distinct users are incurring the most credits, but was hoping to use Snowflake directly to correlate who is making which calls at the hours corresponding to the most credits. It doesn’t have to be a fully automated approach, I can do the legwork, I’m just unsure how to do this without segmenting the warehouses which would take a bit of work and uncertainty since it affects production.

One of the definite steps I took that should help is reducing the size of my warehouse that serves these queries. But I’m unsure how to segment and isolate what’s incurring the most cost here more definitely.

alter warehouse my_wh set warehouse_size=medium;

I’ve read about resource monitors in their documentation here but it doesn’t help much with the isolation aspect: https://docs.snowflake.com/en/user-guide/resource-monitors.html

Ultimately what is best would be to have a resource monitor by user so I know for sure the source, including the tables the statements were issued to.

Greg Pavlik · Accepted Answer

It's more a process than a single event or piece of code, but here's a SQL query that can help. To isolate credit consumption cleanly, you need separate warehouses. It is possible, however, to estimate the credit consumption over time by user. It's an estimate because a warehouse is a shared resource, and since two or more users can be using a warehouse simultaneously the best we can do is figure a way to apportion who's responsible for what part of that consumption.

The following query estimates credit consumption by user over time using the following approach:

Each segment in time that a warehouse runs gets logged as a row in the SNOWFLAKE.ACCOUNT_USAGE.METERING_HISTORY view.
If only one user is active in the duration of that segment, the query assigns 100% of the usage to that user.
If more than one user is active in the duration of a segment, the query takes the total query run time for a user and divides it by the total query run time in that segment for all users. This pro-rates the shared warehouse by query runtime.

#3 is the approximation, but it's suitable if you don't use it for chargebacks or billing someone for data share usage.

Be sure to change the warehouse name to your WH name and set the start and end timestamps for the duration you'd like to check usage.

with
TF(WH_NAME, START_DATE, END_DATE) as
(
    --     Warehouse    Start Date      End Date
    select 'TEST',      '2021-04-01',   '2021-05-01'  -- Replace values here
),
WH(START_TIME, END_TIME, CREDITS_USED) as
(
    select  START_TIME, END_TIME, CREDITS_USED
    from    "SNOWFLAKE"."ACCOUNT_USAGE"."METERING_HISTORY", TF
    where   "NAME"        = TF.WH_NAME    and
            "START_TIME" >= TF.START_DATE and
            "END_TIME"   <= TF.END_DATE
),
QRY(USER_NAME, START_TIME, END_TIME, DURATION) as
(
    select   USER_NAME,  START_TIME, END_TIME, TOTAL_ELAPSED_TIME
    from    "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY", TF
    where   "WAREHOUSE_NAME" = TF.WH_NAME and
            "START_TIME" >= TF.START_DATE and
            "END_TIME"   <= TF.END_DATE
),
DUR(USER_NAME, WH_START_TIME, WH_END_TIME, DURATION) as
(
select   QRY.USER_NAME          as USER_NAME
        ,WH.START_TIME          as WH_START_TIME
        ,WH.END_TIME            as WH_END_TIME
        ,sum(QRY.DURATION)      as QRY_DURATION
from    WH
    left join QRY on
        QRY.START_TIME > WH.START_TIME and
        QRY.END_TIME   < WH.END_TIME
group by    WH_START_TIME, WH_END_TIME, USER_NAME
),
WHBU (WH_START_TIME, WH_END_TIME, CREDITS, DURATION, USER_NAME) as
(
select   WH.START_TIME           as WH_START_TIME
        ,WH.END_TIME             as WH_END_TIME
        ,WH.CREDITS_USED         as CREDITS
        ,DUR.DURATION            as DURATION
        ,DUR.USER_NAME           as USER_NAME
from    WH
    left join DUR on WH.START_TIME = DUR.WH_START_TIME and
                     WH.END_TIME   = DUR.WH_END_TIME
order by WH.START_TIME, USER_NAME
),
CBU(USER_NAME, CREDITS) as
(
select   USER_NAME
        ,CREDITS * DURATION / sum(DURATION) over (partition by WH_START_TIME, WH_END_TIME) as PRO_RATED_CREDITS
from    WHBU
)
select      USER_NAME
            ,sum(CREDITS) as ESTIMATED_CREDITS
from        CBU
group by    USER_NAME
order by    ESTIMATED_CREDITS desc
;

How to reduce and monitor Compute resources in Snowflake?

Answers (1)

Related Questions