Crashmeister
Crashmeister

Reputation: 385

Bizarre result in SQL query - PostgreSQL

I discovered this strange behavior with this query:

-- TP4N has stock_class = 'Bond'

select  lot.symbol
      , round(sum(lot.qty_left), 4) as "Qty"
  from  ( select  symbol
                , qty_left
--                , amount
            from trade_lot_tbl t01
            where t01.symbol not in (select symbol from stock_tbl where stock_class = 'Cash')
              and t01.qty_left > 0
              and t01.trade_date <= current_date        -- only current trades
           union 
           select  'CASH' as symbol
            , sum(qty_left) as qty_left
--            , sum(amount)   as amount
            from trade_lot_tbl t11
            where t11.symbol in (select symbol from stock_tbl where stock_class = 'Cash')
              and t11.qty_left > 0
              and t11.trade_date <= current_date        -- only current trades
            group by t11.symbol
        ) lot
  group by lot.symbol
  order by lot.symbol
;

Run as is, the Qty for TP4N is 1804.42

Run with the two 'amount' lines un-commented, which as far as I can tell should NOT affect the result, yet Qty for TP4N = 1815.36. Only ONE of the symbols (TP4N) has a changed value, all others remain the same.

Run with the entire 'union' statement commented out results in Qty for TP4N = 1827.17

The correct answer, as far as I can tell, is 1827.17.

So, to summarize, I get three different values by modifying parts of the query that, as far as I can tell, should NOT affect the answer.

I'm sure I'm going to kick myself when the puzzle is solved, this smells like a silly mistake.

Upvotes: 0

Views: 39

Answers (1)

GMB
GMB

Reputation: 222582

Likely, what you are seeing is caused by the use of union. This set operator deduplicates the resultsets that are returned by both queries. So adding or removing columns in the unioned sets may affect the final resultset (by default, adding more columns reduces the risk of duplication).

As a rule of thumb: unless you do want deduplication, you should use union all (which is also more efficient, since the database does not need to search for duplicates).

Upvotes: 3

Related Questions