Reputation: 31
I am trying to work with a sql query on snowflake where I am trying to join multiple tables but my query is taking forever to run, I am not sure whether its my query or may be I have taken some wrong approach.
I have these below tables in snowflake -
1)RR_SUMM, 2) YY_TEXTENTR, 3) KK_SUBEVT, 4) LG_NBETR, 5) XX_RPOPO, 6) VV_KJIU, 7) LL_JJHHHIP, 8) UU_GHGGHJ,
9) QQ_BHBHGGG, 10) TT_HJHHSY
So RR_SUMM is my primary table
and each table consists of a common column labelled as "_ID"
My Goal is to join all the other 9 tables with primary table RR_SUMM using _ID column
as I am trying to extract and combine some of the fields from each table with the primary table.
I am following the approach of applying left outer join to combine all the other tables with primary table RR_SUMM
But my approach is taking forever to run as most of the tables are of around 25 GB in size.
SQL query Which I have written in SNOWFLAKE is below-
INSERT INTO "PRD"."POST"."_ALL_EVENTS"
SELECT
DISTINCT SUMMARY._ID,
SUMMARY.GEP_ID,
SUMMARY.AK_CD,
SUMMARY.AK1_CD,
SUMMARY.AK2_CD,
SUMMARY.JJ_DT,
SUMMARY.IL_OVRD,
SUMMARY.STRT_DT,
SUMMARY.EVENT_DT,
SUMMARY.PUNLICATION_DT,
SUMMARY.END_DT,
SUMMARY.END_1DT,
SUMMARY.OO_IND,
SUMMARY.EXPN_DT,
SUMMARY.STATHJJ_CD,
SUMMARY.STATHJJ_DT,
SUMMARY.ERSK_CD,
SUMMARY.DSRP_NBR,
SUMMARY.LNBR,
SUMMARY.LK_REF,
SUMMARY.OOLDESC_CD,
SUMMARY.LMN_CD,
TEXT.UTXCT,
TEXT.GL_CD,
SB.MN_CD,
SB.MN_DT,
SB.EVTEXT,
SB._START_DATE,
SB._END_DATE,
RE.RRONBT,
RE.NN_CD,
RP.RP_CD,
RP.RP_T_CD,
RP.RNME,
PP.FNBR,
PP.FDESC_CD,
IP.FL_DT,
IP.FL_DTTYPCD,
XP.JJ_DT,
XP.OO_CD,
OP.ORG_REF,
OP.FL_NBR,
KP.EVK_CD,
KP.EVJK_DT
FROM
"PRD"."POST"."RR_SUMM" SUMMARY
LEFT OUTER JOIN
"PRD"."POST"."YY_TEXTENTR" TEXT
ON TEXT._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."KK_SUBEVT" SB
ON SB._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."LG_NBETR" RE
ON RE._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."XX_RPOPO" RP
ON RP._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."VV_KJIU" PP
ON PP._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."LL_JJHHHIP" IP
ON IP._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."UU_GHGGHJ" XP
ON XP._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."QQ_BHBHGGG" OP
ON OP._ID = SUMMARY._ID
LEFT OUTER JOIN
"PRD"."POST"."TT_HJHHSY" KP
ON KP._ID = SUMMARY._ID
GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44;
Please let me know if there is any other way to my problem which I can follow to run my query quickly. I cannot limit my data as I need all of the data for my analysis.
Any help will be really appreciated.
Thanks
Upvotes: 2
Views: 1716
Reputation: 1510
There are lots of reasons a query can run slowly, JOIN ORDER, data skew, bad cardinality estimate, Warehouse size etc to name a few.
Since you have so many JOINs, without looking into the query profile, it would be hard to say.
Best way is to open a support ticket for review.
Upvotes: 0
Reputation: 25903
At one level GROUPING by all columns is the same as DISTINCT.
But given you are rolling it all up, to only get one of each, you can push the DISTINCTs lower into the queries, and the joins should have not duplicate values
SELECT
summary._id,
summary.gep_id,
summary.ak_cd,
summary.ak1_cd,
summary.ak2_cd,
summary.jj_dt,
summary.il_ovrd,
summary.strt_dt,
summary.event_dt,
summary.punlication_dt,
summary.end_dt,
summary.end_1dt,
summary.oo_ind,
summary.expn_dt,
summary.stathjj_cd,
summary.stathjj_dt,
summary.ersk_cd,
summary.dsrp_nbr,
summary.lnbr,
summary.lk_ref,
summary.ooldesc_cd,
summary.lmn_cd,
text.utxct,
text.gl_cd,
sb.mn_cd,
sb.mn_dt,
sb.evtext,
sb._start_date,
sb._end_date,
re.rronbt,
re.nn_cd,
rp.rp_cd,
rp.rp_t_cd,
rp.rnme,
pp.fnbr,
pp.fdesc_cd,
ip.fl_dt,
ip.fl_dttypcd,
xp.jj_dt,
xp.oo_cd,
op.org_ref,
op.fl_nbr,
kp.evk_cd,
kp.evjk_dt
FROM (
SELECT DISTINCT
summary._id,
summary.gep_id,
summary.ak_cd,
summary.ak1_cd,
summary.ak2_cd,
summary.jj_dt,
summary.il_ovrd,
summary.strt_dt,
summary.event_dt,
summary.punlication_dt,
summary.end_dt,
summary.end_1dt,
summary.oo_ind,
summary.expn_dt,
summary.stathjj_cd,
summary.stathjj_dt,
summary.ersk_cd,
summary.dsrp_nbr,
summary.lnbr,
summary.lk_ref,
summary.ooldesc_cd,
summary.lmn_cd
FROM prd.post.rr_summ AS summary
) AS summary
LEFT OUTER JOIN (
SELECT DISTINCT
text._id
text.utxct,
text.gl_cd
FROM prd.post.yy_textentr AS text
) AS text
ON text._id = summary._id
LEFT OUTER JOIN (
SELECT DISTINCT
sb._id
sb.mn_cd,
sb.mn_dt,
sb.evtext,
sb._start_date,
sb._end_date
FROM prd.post.kk_subevt AS sb
) AS sb
ON sb._id = summary._id
LEFT OUTER JOIN (
SELECT DISTINCT
re._id
re.rronbt,
re.nn_cd
FROM prd.post.lg_nbetr AS re
) AS re
ON re._id = summary._id
LEFT OUTER JOIN (
SELECT DISTINCT
rp._id
rp.rp_cd,
rp.rp_t_cd,
rp.rnme
FROM
prd.post.xx_rpopo AS rp
) AS rp
ON rp._id = summary._id
LEFT OUTER JOIN(
SELECT DISTINCT
pp._id
pp.fnbr,
pp.fdesc_cd
FROM prd.post.vv_kjiu AS pp
) AS pp
ON pp._id = summary._id
LEFT OUTER JOIN (
SELECT DISTINCT
ip._id
ip.fl_dt,
ip.fl_dttypcd
FROM prd.post.ll_jjhhhip AS ip
) AS ip
ON ip._id = summary._id
LEFT OUTER JOIN (
SELECT DISTINCT
xp._id
xp.jj_dt,
xp.oo_cd
FROM prd.post.uu_ghgghj AS xp
) AS xp
ON xp._id = summary._id
LEFT OUTER JOIN (
SELECT DISTINCT
op._id,
op.org_ref,
op.fl_nbr
FROM prd.post.qq_bhbhggg AS op
) AS op
ON op._id = summary._id
LEFT OUTER JOIN (
SELECT DISTINCT
kp._id
kp.evk_cd,
kp.evjk_dt
FROM prd.post.tt_hjhhsy AS kp
) AS kp
ON kp._id = summary._id;
So this should be much faster.
Upvotes: 1