Reputation: 329
I am experiencing unusual behavior with Redshift where a basic query with inner joins has just stopped working as it is expected to.
Providing the query below,
SELECT D.asin, D.client_account_id
FROM ams.t_asin_segmentation_detail A_0
INNER JOIN ams.t_asin_segmentation_value B
ON A_0.ASIN_SEGMENTATION_VALUE_ID = B.ID AND A_0.IS_DELETED = 'N' AND A_0.IS_ACTIVE = 'Y' AND
B.IS_DELETED = 'N' AND B.IS_ACTIVE = 'Y' AND B.ID IN (900)
INNER JOIN ams.t_asin_segmentation_type C
ON B.ASIN_SEGMENTATION_TYPE_ID = C.ID AND C.IS_DELETED = 'N' AND C.IS_ACTIVE = 'Y' AND
C.ID = 1687 AND C.BUSINESS_UNIT_ID = 15
INNER JOIN ams.t_asin D ON A_0.asin_id = D.ID AND D.is_active = 'Y' and D.is_deleted = 'N';
This Query is returning just a single row which is incorrect. It is supposed to return more than a 1000 rows.
When I write the same query differently, just removing the last JOIN with table aliased D and writing it this way,
SELECT asin, client_account_id
FROM ams.t_asin
WHERE id IN (SELECT DISTINCT A_0.asin_id
FROM ams.t_asin_segmentation_detail A_0
INNER JOIN ams.t_asin_segmentation_value B
ON A_0.ASIN_SEGMENTATION_VALUE_ID = B.ID AND A_0.IS_DELETED = 'N' AND
A_0.IS_ACTIVE = 'Y' AND
B.IS_DELETED = 'N' AND B.IS_ACTIVE = 'Y' AND B.ID IN (900)
INNER JOIN ams.t_asin_segmentation_type C
ON B.ASIN_SEGMENTATION_TYPE_ID = C.ID AND C.IS_DELETED = 'N' AND C.IS_ACTIVE = 'Y' AND
C.ID = 1687 AND C.BUSINESS_UNIT_ID = 15)
AND is_active = 'Y'
AND is_deleted = 'N';
This starts fetching the correct number of rows without any logical change in the Query itself.
I further tweaked with the original query and just replaced the INNER JOIN with table aliased D to a LEFT JOIN and surprisingly it started giving the correct number of rows. It is important to note that even with a LEFT JOIN, the SELECT statement is still fetching columns from table aliased D which as shown below,
SELECT D.asin, D.client_account_id
FROM ams.t_asin_segmentation_detail A_0
INNER JOIN ams.t_asin_segmentation_value B
ON A_0.ASIN_SEGMENTATION_VALUE_ID = B.ID AND A_0.IS_DELETED = 'N' AND A_0.IS_ACTIVE = 'Y' AND
B.IS_DELETED = 'N' AND B.IS_ACTIVE = 'Y' AND B.ID IN (900)
INNER JOIN ams.t_asin_segmentation_type C
ON B.ASIN_SEGMENTATION_TYPE_ID = C.ID AND C.IS_DELETED = 'N' AND C.IS_ACTIVE = 'Y' AND
C.ID = 1687 AND C.BUSINESS_UNIT_ID = 15
LEFT JOIN ams.t_asin D ON A_0.asin_id = D.ID AND D.is_active = 'Y' and D.is_deleted = 'N';
Would appreciate if anyone can provide any reason for this to happen.
Upvotes: 0
Views: 472
Reputation: 329
Got this resolved by changing the encoding type on Sort Keys from ZSTD to RAW. Still not entirely sure how that could have led to change in behavior of INNER JOIN.
The Sort Keys in these queries are all the ID columns being used in joins.
Both queries (INNER JOIN and WHERE IN) returning the same rows after this change.
Upvotes: 0
Reputation: 1270713
Your queries are nothing similar. Your first query is saying that all matching rows have the same values of D.asin
and D.client_account_id
. That is why you are getting one row.
The second query has no distinct
in the outer query, so it will fetch all those rows.
I'm not sure why you would think that these are logically equivalent.
Upvotes: 0