DK5
DK5

Reputation: 302

Join error in using CTE in Hive

i have two queries which are doing almost the similar work. One does it without CTEs and one with CTEs. I am unable to figure out why the second query is giving absolutely no results while the first one is.

I have spent the last two hours trying to figure this out by trying out various joins and the same joins working in query 1 are not working in query 2. I hope someone can guide me with this.

First query (Returns results):

WITH MessageCTE AS 
    (
    SELECT dt
    , id
    , ts
    , family
    , message_type
    , to_user
    , message_id
    , class
    FROM dhruv.MessageLatencyInformation_20171210_20171125_to_20171130_02 as latencydata
    INNER JOIN dhruv.UsersOn503AndAbove_20171201_200k as required_users
    ON latencydata.to_user = required_users.user_id
    )
SELECT COUNT(DISTINCT to_user) AS Users
, AVG(latency) AS AvgLatency
, AVG(CASE WHEN latency > 0 THEN latency ELSE NULL END) AS AvgLatency_Positive
, PERCENTILE(latency, 0.5) AS 50Percentile
, PERCENTILE(latency, 0.75) AS 75Percentile
, PERCENTILE(latency, 0.8) AS 80Percentile
, PERCENTILE(latency, 0.9) AS 90Percentile
, PERCENTILE(latency, 0.95) AS 95Percentile
, PERCENTILE(latency, 0.99) AS 99Percentile
FROM
    (
    SELECT a.dt, a.to_user, (latency_dl.ts - latency_pb.ts) as latency
    FROM 
        (
        SELECT dt
        , id, ts
        , family
        , message_type
        , to_user
        , message_id
        , class
        FROM MessageCTE
        WHERE class = 'pb'
        ) as latency_pb
    INNER JOIN 
        (SELECT dt
        , id
        , ts
        , family
        , message_type
        , to_user
        , message_id
        , class
        FROM MessageCTE
        WHERE class = 'rdl'
        AND family = 'stm'
        ) as latency_rdl
    ON latency_pb.dt = latency_rdl.dt and latency_pb.to_user = latency_rdl.to_user and latency_pb.id = latency_rdl.id
    INNER JOIN
        (
        SELECT dt
        , id
        , ts
        , family
        , message_type
        , to_user
        , message_id
        , class
        FROM MessageCTE
        WHERE class = 'dl'
        ) as latency_dl
    ON latency_rdl.dt = latency_dl.dt and latency_rdl.to_user = latency_dl.to_user and latency_rdl.id = latency_dl.id) AS UserLatency;

First Query Output: First Query Output

Now Second Query, is a slight modification and all the same conditions, but for some reason it is returning no matches. Hopefully someone can guide me out, i just spent around 2 hours trying some joins out and i am unable to figure out why they are not happening.

Second Query:

WITH MessageCTE_pb AS 
    (
    SELECT dt, id, ts, to_user
    FROM 
        (
        SELECT dt, id, min(ts) as ts, to_user
        FROM dhruv.MessageLatencyInformation_20171210_20171125_to_20171130_02
        WHERE class = 'pb'
        GROUP BY dt, to_user, id
        ) as latencydata
    INNER JOIN dhruv.UsersOn503AndAbove_20171201_200k as required_users
    ON latencydata.to_user = required_users.user_id
    )
, MessageCTE_dl AS 
    (
    SELECT dt, id, ts, to_use
    FROM
        (
        SELECT dt, id, max(ts) as ts, to_user 
        FROM dhruv.MessageLatencyInformation_20171210_20171125_to_20171130_02 
        WHERE class = 'dl' 
        GROUP BY dt, to_user, id
        ) as latencydata
    INNER JOIN dhruv.UsersOn503AndAbove_20171201_200k as required_users
    ON latencydata.to_user = required_users.user_id
    )
, MessageCTE_rdl AS 
    (
    SELECT dt, id, to_user
    FROM
        (
        SELECT DISTINCT dt, id, to_user 
        FROM dhruv.MessageLatencyInformation_20171210_20171125_to_20171130_02
        WHERE class = 'rdl' 
        AND family = 'stm'
        ) as latencydata 
    INNER JOIN dhruv.UsersOn503AndAbove_20171201_200k as required_users
    ON latencydata.to_user = required_users.user_id
    )
SELECT COUNT(DISTINCT to_user) AS Users 
, AVG(latency) AS AvgLatency 
, AVG(CASE WHEN latency > 0 THEN latency ELSE NULL END) AS AvgLatency_Positive 
, PERCENTILE(latency, 0.5) AS 50Percentile 
, PERCENTILE(latency, 0.75) AS 75Percentile 
, PERCENTILE(latency, 0.8) AS 80Percentile 
, PERCENTILE(latency, 0.9) AS 90Percentile 
, PERCENTILE(latency, 0.95) AS 95Percentile 
, PERCENTILE(latency, 0.99) AS 99Percentile
FROM
    (
    SELECT a.dt, a.to_user, (latency_dl.ts - latency_pb.ts) as latency
    FROM MessageCTE_pb as latency_pb
    INNER JOIN MessageCTE_rdl as latency_rdl
    ON latency_pb.dt = latency_rdl.dt and latency_pb.to_user = latency_rdl.to_user and latency_pb.id = latency_rdl.id
    INNER JOIN MessageCTE_dl as latency_dl
    ON latency_rdl.dt = latency_dl.dt and latency_rdl.to_user = latency_dl.to_user and latency_rdl.id = latency_dl.id) AS UserLatency;

Thanks!

Second Query Result: Second Query Result

Upvotes: 1

Views: 656

Answers (1)

MatBailie
MatBailie

Reputation: 86725

Another comment in an answer block so I can post a bunch of SQL...

What is the result of this?

WITH
    UserLatency AS 
(
    SELECT
        latencydata.dt,
        latencydata.to_user,
        latencydata.id,
        MAX(CASE WHEN latencydata.class = 'dl' THEN latencydata.ts END)
        -
        MIN(CASE WHEN latencydata.class = 'pb' THEN latencydata.ts END)
            AS latency
    FROM
        dhruv.MessageLatencyInformation_20171210_20171125_to_20171130_02   AS latencydata
    INNER JOIN
        dhruv.UsersOn503AndAbove_20171201_200k                             AS required_users
            ON latencydata.to_user = required_users.user_id
    GROUP BY
        latencydata.dt,
        latencydata.to_user,
        latencydata.id
    HAVING
        0 < SUM(CASE WHEN latencydata.class  = 'rdl'
                      AND latencydata.family = 'stm' THEN 1 END)
)
SELECT
      COUNT(DISTINCT to_user)                       AS Users
    , AVG(latency)                                  AS AvgLatency 
    , AVG(CASE WHEN latency > 0 THEN latency END)   AS AvgLatency_Positive 
    , PERCENTILE(latency, 0.50)                     AS 50Percentile 
    , PERCENTILE(latency, 0.75)                     AS 75Percentile 
    , PERCENTILE(latency, 0.80)                     AS 80Percentile 
    , PERCENTILE(latency, 0.90)                     AS 90Percentile 
    , PERCENTILE(latency, 0.95)                     AS 95Percentile 
    , PERCENTILE(latency, 0.99)                     AS 99Percentile
FROM
    UserLatency
;

Upvotes: 1

Related Questions