Reputation: 1095
I have several connections to Snowflake issuing SQL commands including adhoc queries I run for debugging/development manually, tasks I run twice a day to make summary tables, and Chartio (a dashboarding application) running interval queries against mostly my summary tables.
I’m using a lot more credits lately primarily due to computational resources. I could segment the different connections to different warehouses in order to isolate which of these distinct users are incurring the most credits, but was hoping to use Snowflake directly to correlate who is making which calls at the hours corresponding to the most credits. It doesn’t have to be a fully automated approach, I can do the legwork, I’m just unsure how to do this without segmenting the warehouses which would take a bit of work and uncertainty since it affects production.
One of the definite steps I took that should help is reducing the size of my warehouse that serves these queries. But I’m unsure how to segment and isolate what’s incurring the most cost here more definitely.
alter warehouse my_wh set warehouse_size=medium;
I’ve read about resource monitors in their documentation here but it doesn’t help much with the isolation aspect: https://docs.snowflake.com/en/user-guide/resource-monitors.html
Ultimately what is best would be to have a resource monitor by user so I know for sure the source, including the tables the statements were issued to.
Upvotes: 0
Views: 1007
Reputation: 11046
It's more a process than a single event or piece of code, but here's a SQL query that can help. To isolate credit consumption cleanly, you need separate warehouses. It is possible, however, to estimate the credit consumption over time by user. It's an estimate because a warehouse is a shared resource, and since two or more users can be using a warehouse simultaneously the best we can do is figure a way to apportion who's responsible for what part of that consumption.
The following query estimates credit consumption by user over time using the following approach:
#3 is the approximation, but it's suitable if you don't use it for chargebacks or billing someone for data share usage.
Be sure to change the warehouse name to your WH name and set the start and end timestamps for the duration you'd like to check usage.
with
TF(WH_NAME, START_DATE, END_DATE) as
(
-- Warehouse Start Date End Date
select 'TEST', '2021-04-01', '2021-05-01' -- Replace values here
),
WH(START_TIME, END_TIME, CREDITS_USED) as
(
select START_TIME, END_TIME, CREDITS_USED
from "SNOWFLAKE"."ACCOUNT_USAGE"."METERING_HISTORY", TF
where "NAME" = TF.WH_NAME and
"START_TIME" >= TF.START_DATE and
"END_TIME" <= TF.END_DATE
),
QRY(USER_NAME, START_TIME, END_TIME, DURATION) as
(
select USER_NAME, START_TIME, END_TIME, TOTAL_ELAPSED_TIME
from "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY", TF
where "WAREHOUSE_NAME" = TF.WH_NAME and
"START_TIME" >= TF.START_DATE and
"END_TIME" <= TF.END_DATE
),
DUR(USER_NAME, WH_START_TIME, WH_END_TIME, DURATION) as
(
select QRY.USER_NAME as USER_NAME
,WH.START_TIME as WH_START_TIME
,WH.END_TIME as WH_END_TIME
,sum(QRY.DURATION) as QRY_DURATION
from WH
left join QRY on
QRY.START_TIME > WH.START_TIME and
QRY.END_TIME < WH.END_TIME
group by WH_START_TIME, WH_END_TIME, USER_NAME
),
WHBU (WH_START_TIME, WH_END_TIME, CREDITS, DURATION, USER_NAME) as
(
select WH.START_TIME as WH_START_TIME
,WH.END_TIME as WH_END_TIME
,WH.CREDITS_USED as CREDITS
,DUR.DURATION as DURATION
,DUR.USER_NAME as USER_NAME
from WH
left join DUR on WH.START_TIME = DUR.WH_START_TIME and
WH.END_TIME = DUR.WH_END_TIME
order by WH.START_TIME, USER_NAME
),
CBU(USER_NAME, CREDITS) as
(
select USER_NAME
,CREDITS * DURATION / sum(DURATION) over (partition by WH_START_TIME, WH_END_TIME) as PRO_RATED_CREDITS
from WHBU
)
select USER_NAME
,sum(CREDITS) as ESTIMATED_CREDITS
from CBU
group by USER_NAME
order by ESTIMATED_CREDITS desc
;
Upvotes: 3