Aneeb Khawar
Aneeb Khawar

Reputation: 329

Redshift behaving differetly for the same query written differently

I am experiencing unusual behavior with Redshift where a basic query with inner joins has just stopped working as it is expected to.

Providing the query below,

SELECT D.asin, D.client_account_id
FROM ams.t_asin_segmentation_detail A_0
       INNER JOIN ams.t_asin_segmentation_value B
                  ON A_0.ASIN_SEGMENTATION_VALUE_ID = B.ID AND A_0.IS_DELETED = 'N' AND A_0.IS_ACTIVE = 'Y' AND
                     B.IS_DELETED = 'N' AND B.IS_ACTIVE = 'Y' AND B.ID IN (900)
       INNER JOIN ams.t_asin_segmentation_type C
                  ON B.ASIN_SEGMENTATION_TYPE_ID = C.ID AND C.IS_DELETED = 'N' AND C.IS_ACTIVE = 'Y' AND
                     C.ID = 1687 AND C.BUSINESS_UNIT_ID = 15
       INNER JOIN ams.t_asin D ON A_0.asin_id = D.ID AND D.is_active = 'Y' and D.is_deleted = 'N';    

This Query is returning just a single row which is incorrect. It is supposed to return more than a 1000 rows.

When I write the same query differently, just removing the last JOIN with table aliased D and writing it this way,

SELECT asin, client_account_id
FROM ams.t_asin
WHERE id IN (SELECT DISTINCT A_0.asin_id
             FROM ams.t_asin_segmentation_detail A_0
                    INNER JOIN ams.t_asin_segmentation_value B
                               ON A_0.ASIN_SEGMENTATION_VALUE_ID = B.ID AND A_0.IS_DELETED = 'N' AND
                                  A_0.IS_ACTIVE = 'Y' AND
                                  B.IS_DELETED = 'N' AND B.IS_ACTIVE = 'Y' AND B.ID IN (900)
                    INNER JOIN ams.t_asin_segmentation_type C
                               ON B.ASIN_SEGMENTATION_TYPE_ID = C.ID AND C.IS_DELETED = 'N' AND C.IS_ACTIVE = 'Y' AND
                                  C.ID = 1687 AND C.BUSINESS_UNIT_ID = 15)
  AND is_active = 'Y'
  AND is_deleted = 'N';

This starts fetching the correct number of rows without any logical change in the Query itself.

I further tweaked with the original query and just replaced the INNER JOIN with table aliased D to a LEFT JOIN and surprisingly it started giving the correct number of rows. It is important to note that even with a LEFT JOIN, the SELECT statement is still fetching columns from table aliased D which as shown below,

SELECT D.asin, D.client_account_id
FROM ams.t_asin_segmentation_detail A_0
       INNER JOIN ams.t_asin_segmentation_value B
                  ON A_0.ASIN_SEGMENTATION_VALUE_ID = B.ID AND A_0.IS_DELETED = 'N' AND A_0.IS_ACTIVE = 'Y' AND
                     B.IS_DELETED = 'N' AND B.IS_ACTIVE = 'Y' AND B.ID IN (900)
       INNER JOIN ams.t_asin_segmentation_type C
                  ON B.ASIN_SEGMENTATION_TYPE_ID = C.ID AND C.IS_DELETED = 'N' AND C.IS_ACTIVE = 'Y' AND
                     C.ID = 1687 AND C.BUSINESS_UNIT_ID = 15
       LEFT JOIN ams.t_asin D ON A_0.asin_id = D.ID AND D.is_active = 'Y' and D.is_deleted = 'N';

Would appreciate if anyone can provide any reason for this to happen.

Upvotes: 0

Views: 472

Answers (2)

Aneeb Khawar
Aneeb Khawar

Reputation: 329

Got this resolved by changing the encoding type on Sort Keys from ZSTD to RAW. Still not entirely sure how that could have led to change in behavior of INNER JOIN.

The Sort Keys in these queries are all the ID columns being used in joins.

Both queries (INNER JOIN and WHERE IN) returning the same rows after this change.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1270713

Your queries are nothing similar. Your first query is saying that all matching rows have the same values of D.asin and D.client_account_id. That is why you are getting one row.

The second query has no distinct in the outer query, so it will fetch all those rows.

I'm not sure why you would think that these are logically equivalent.

Upvotes: 0

Related Questions