saurabh
saurabh

Reputation: 31

Snowflake SQL Query taking too much time to run when trying to apply multiple joins

I am trying to work with a sql query on snowflake where I am trying to join multiple tables but my query is taking forever to run, I am not sure whether its my query or may be I have taken some wrong approach.

I have these below tables in snowflake -

1)RR_SUMM, 2) YY_TEXTENTR, 3) KK_SUBEVT, 4) LG_NBETR, 5) XX_RPOPO, 6) VV_KJIU, 7) LL_JJHHHIP, 8) UU_GHGGHJ,
9) QQ_BHBHGGG, 10) TT_HJHHSY

So RR_SUMM is my primary table

and each table consists of a common column labelled as "_ID"

My Goal is to join all the other 9 tables with primary table RR_SUMM using _ID column

as I am trying to extract and combine some of the fields from each table with the primary table.

I am following the approach of applying left outer join to combine all the other tables with primary table RR_SUMM

But my approach is taking forever to run as most of the tables are of around 25 GB in size.

SQL query Which I have written in SNOWFLAKE is below-

INSERT INTO  "PRD"."POST"."_ALL_EVENTS" 
SELECT
DISTINCT SUMMARY._ID,
SUMMARY.GEP_ID,
SUMMARY.AK_CD,
SUMMARY.AK1_CD,
SUMMARY.AK2_CD,
SUMMARY.JJ_DT,
SUMMARY.IL_OVRD,
SUMMARY.STRT_DT,
SUMMARY.EVENT_DT,
SUMMARY.PUNLICATION_DT,
SUMMARY.END_DT,
SUMMARY.END_1DT,
SUMMARY.OO_IND,
SUMMARY.EXPN_DT,
SUMMARY.STATHJJ_CD,
SUMMARY.STATHJJ_DT,
SUMMARY.ERSK_CD,
SUMMARY.DSRP_NBR,
SUMMARY.LNBR,
SUMMARY.LK_REF,
SUMMARY.OOLDESC_CD,
SUMMARY.LMN_CD,
TEXT.UTXCT,
TEXT.GL_CD,
SB.MN_CD,
SB.MN_DT,
SB.EVTEXT,
SB._START_DATE,
SB._END_DATE,
RE.RRONBT,
RE.NN_CD,
RP.RP_CD,
RP.RP_T_CD,
RP.RNME,
PP.FNBR,
PP.FDESC_CD,
IP.FL_DT,
IP.FL_DTTYPCD,
XP.JJ_DT,
XP.OO_CD,
OP.ORG_REF,
OP.FL_NBR,
KP.EVK_CD,
KP.EVJK_DT

 FROM 
 
"PRD"."POST"."RR_SUMM" SUMMARY
 
 LEFT OUTER JOIN 
 
 "PRD"."POST"."YY_TEXTENTR" TEXT
 
 ON TEXT._ID = SUMMARY._ID
 
 LEFT OUTER JOIN 
 
 "PRD"."POST"."KK_SUBEVT" SB
 
 ON  SB._ID = SUMMARY._ID
 
 LEFT OUTER JOIN 
 
"PRD"."POST"."LG_NBETR" RE
 
 ON RE._ID = SUMMARY._ID
 
 LEFT OUTER JOIN 
 
 "PRD"."POST"."XX_RPOPO" RP
 
 ON RP._ID = SUMMARY._ID
 
 LEFT OUTER JOIN 
 
 "PRD"."POST"."VV_KJIU" PP
 
  ON PP._ID = SUMMARY._ID
 
 LEFT OUTER JOIN 
 
 "PRD"."POST"."LL_JJHHHIP" IP
 
  ON IP._ID = SUMMARY._ID
 
  LEFT OUTER JOIN 
 
 "PRD"."POST"."UU_GHGGHJ" XP
 
  
  ON XP._ID = SUMMARY._ID
   
 LEFT OUTER JOIN 
 
 "PRD"."POST"."QQ_BHBHGGG" OP
 
 
  ON OP._ID = SUMMARY._ID
   
  LEFT OUTER JOIN 
 
 
"PRD"."POST"."TT_HJHHSY" KP
 
 ON KP._ID = SUMMARY._ID
 


GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44;

Please let me know if there is any other way to my problem which I can follow to run my query quickly. I cannot limit my data as I need all of the data for my analysis.

Any help will be really appreciated.

Thanks

Upvotes: 2

Views: 1716

Answers (2)

Eric Lin
Eric Lin

Reputation: 1510

There are lots of reasons a query can run slowly, JOIN ORDER, data skew, bad cardinality estimate, Warehouse size etc to name a few.

Since you have so many JOINs, without looking into the query profile, it would be hard to say.

Best way is to open a support ticket for review.

Upvotes: 0

Simeon Pilgrim
Simeon Pilgrim

Reputation: 25903

At one level GROUPING by all columns is the same as DISTINCT.

But given you are rolling it all up, to only get one of each, you can push the DISTINCTs lower into the queries, and the joins should have not duplicate values

SELECT
    summary._id,
    summary.gep_id,
    summary.ak_cd,
    summary.ak1_cd,
    summary.ak2_cd,
    summary.jj_dt,
    summary.il_ovrd,
    summary.strt_dt,
    summary.event_dt,
    summary.punlication_dt,
    summary.end_dt,
    summary.end_1dt,
    summary.oo_ind,
    summary.expn_dt,
    summary.stathjj_cd,
    summary.stathjj_dt,
    summary.ersk_cd,
    summary.dsrp_nbr,
    summary.lnbr,
    summary.lk_ref,
    summary.ooldesc_cd,
    summary.lmn_cd,
    text.utxct,
    text.gl_cd,
    sb.mn_cd,
    sb.mn_dt,
    sb.evtext,
    sb._start_date,
    sb._end_date,
    re.rronbt,
    re.nn_cd,
    rp.rp_cd,
    rp.rp_t_cd,
    rp.rnme,
    pp.fnbr,
    pp.fdesc_cd,
    ip.fl_dt,
    ip.fl_dttypcd,
    xp.jj_dt,
    xp.oo_cd,
    op.org_ref,
    op.fl_nbr,
    kp.evk_cd,
    kp.evjk_dt
FROM ( 
    SELECT DISTINCT
        summary._id,
        summary.gep_id,
        summary.ak_cd,
        summary.ak1_cd,
        summary.ak2_cd,
        summary.jj_dt,
        summary.il_ovrd,
        summary.strt_dt,
        summary.event_dt,
        summary.punlication_dt,
        summary.end_dt,
        summary.end_1dt,
        summary.oo_ind,
        summary.expn_dt,
        summary.stathjj_cd,
        summary.stathjj_dt,
        summary.ersk_cd,
        summary.dsrp_nbr,
        summary.lnbr,
        summary.lk_ref,
        summary.ooldesc_cd,
        summary.lmn_cd
    FROM prd.post.rr_summ AS summary
) AS summary
LEFT OUTER JOIN (
    SELECT DISTINCT
        text._id
        text.utxct,
        text.gl_cd
    FROM prd.post.yy_textentr AS text
) AS text
    ON text._id = summary._id
LEFT OUTER JOIN (
    SELECT DISTINCT
        sb._id
        sb.mn_cd,
        sb.mn_dt,
        sb.evtext,
        sb._start_date,
        sb._end_date
    FROM prd.post.kk_subevt AS sb
) AS sb
    ON sb._id = summary._id
LEFT OUTER JOIN (
    SELECT DISTINCT
        re._id
        re.rronbt,
        re.nn_cd
    FROM prd.post.lg_nbetr AS re
) AS re
    ON re._id = summary._id
LEFT OUTER JOIN (
    SELECT DISTINCT
        rp._id
        rp.rp_cd,
        rp.rp_t_cd,
        rp.rnme
    FROM    
 prd.post.xx_rpopo AS rp
) AS rp
    ON rp._id = summary._id
 LEFT OUTER JOIN(
    SELECT DISTINCT
        pp._id
        pp.fnbr,
        pp.fdesc_cd
    FROM prd.post.vv_kjiu AS pp
) AS pp
    ON pp._id = summary._id
LEFT OUTER JOIN (
    SELECT DISTINCT
        ip._id
        ip.fl_dt,
        ip.fl_dttypcd
    FROM prd.post.ll_jjhhhip AS ip
) AS ip
    ON ip._id = summary._id
LEFT OUTER JOIN (
    SELECT DISTINCT
        xp._id
        xp.jj_dt,
        xp.oo_cd
    FROM prd.post.uu_ghgghj AS xp
) AS xp  
    ON xp._id = summary._id 
LEFT OUTER JOIN (
    SELECT DISTINCT
        op._id,
        op.org_ref,
        op.fl_nbr
    FROM prd.post.qq_bhbhggg AS op
) AS op
    ON op._id = summary._id
LEFT OUTER JOIN (
    SELECT DISTINCT
        kp._id
        kp.evk_cd,
        kp.evjk_dt
    FROM prd.post.tt_hjhhsy AS kp
) AS kp
    ON kp._id = summary._id;

So this should be much faster.

Upvotes: 1

Related Questions