Somiya L
Somiya L

Reputation: 55

both the queries should give me the same result - SQL

**Hello, I am trying to find the number of returning and new customers with in a given period of time. I have two queries one is the for just finding the new and returning customers and the other one is the same but is dividing the data into age range and their gender. Technically both of the queries should give me the same totals, but they have different totals. Below are the query, can someone please explain to me what the problem is I am really struggling with it.

Also the total matches in the sample database which I have created but does not match in my actual database.**

I also have a sample database which you can refer in case you need sample data.

https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=be7e1aec30f03edeb0cce246ff05721f

Below is the query for just getting the new and returning customers:

new customer

SELECT
    DECODE(is_new, 1, 'New Customers', 'Returning Customers') type_of_customer,
    COUNT(distinct individual_id) count_of_customers,
    SUM(count_of_transactions) count_of_transactions,
    SUM(sum_of_quantity) sum_of_quantity
FROM (
    SELECT
    individual_id,
    SUM(dollar_value_us),
    sum(quantity) sum_of_quantity,
    count(distinct transaction_number) count_of_transactions,
    CASE WHEN MIN(txn_date) = min_txn_date THEN 1 ELSE 0 END is_new
FROM (
    SELECT 
        individual_id, 
        dollar_value_us,
        txn_date,
        quantity,
        transaction_number,
        MIN(txn_date) OVER(PARTITION BY individual_id) min_txn_date          
    FROM transaction_detail_mv   
    WHERE 
        brand_org_code = 'BRAND'
        AND is_merch = 1
        AND currency_code = 'USD'
        AND line_item_amt_type_cd = 'S'
)
WHERE 
    txn_date >= TO_DATE('10-02-2019', 'DD-MM-YYYY') 
    AND txn_date < TO_DATE('17-02-2019', 'DD-MM-YYYY')
GROUP BY
    individual_id,
    min_txn_date
    )
x GROUP BY is_new

below is the query which is dividing the same data above in their age range and gender but I have two separate queries for new and returning customers.

for new customer the query is :

select gender, 
case when age < 18 then '<18'
when age between 18 and 24 then '18-24'
when age between 25 and 32 then '25-32'
when age between 33 and 39 then '35-39'
when age between 40 and 46 then '40-46'
when age between 47 and 53 then '46-52'
when age between 54 and 60 then '53-58'
when age > 60 then '61+' end as AgeGroup
, count(distinct individual_id) indiv
, count (distinct transaction_number) txn_count
, sum(dollar_value_us) as Spend
, sum(quantity) Qty

from (SELECT 
        a.individual_id, 
        a.dollar_value_us,
        a.txn_date,
        a.quantity,
        a.transaction_number,
        b.gender,
        b.age,
        MIN(txn_date) OVER(PARTITION BY a.individual_id) min_txn_date          
    FROM transaction_detail_mv   a
    join gender_details b on a.individual_id = b.individual_id
    WHERE 
        a.brand_org_code = 'BRAND'
        AND a.is_merch = 1
        AND a.currency_code = 'USD'
        AND a.line_item_amt_type_cd = 'S')

where txn_date >= TO_DATE('10-02-2019', 'DD-MM-YYYY') 
    AND txn_date < TO_DATE('17-02-2019', 'DD-MM-YYYY')
    AND min_txn_date >= TO_DATE('10-02-2019', 'DD-MM-YYYY')
    AND min_txn_date < TO_DATE('17-02-2019', 'DD-MM-YYYY')


group by gender, 
case when age < 18 then '<18'
when age between 18 and 24 then '18-24'
when age between 25 and 32 then '25-32'
when age between 33 and 39 then '35-39'
when age between 40 and 46 then '40-46'
when age between 47 and 53 then '46-52'
when age between 54 and 60 then '53-58'
when age > 60 then '61+' end

for returning customers:

select gender, 
case when age < 18 then '<18'
when age between 18 and 24 then '18-24'
when age between 25 and 32 then '25-32'
when age between 33 and 39 then '35-39'
when age between 40 and 46 then '40-46'
when age between 47 and 53 then '46-52'
when age between 54 and 60 then '53-58'
when age > 60 then '61+' end as AgeGroup
, count(distinct individual_id) indiv
, count (distinct transaction_number) txn_count
, sum(dollar_value_us) as Spend
, sum(quantity) Qty

from (SELECT 
        a.individual_id, 
        a.dollar_value_us,
        a.txn_date,
        a.quantity,
        a.transaction_number,
        b.gender,
        b.age,
        MIN(txn_date) OVER(PARTITION BY a.individual_id) min_txn_date          
    FROM transaction_detail_mv   a
    join gender_details b on a.individual_id = b.individual_id
    WHERE 
        a.brand_org_code = 'BRAND'
        AND a.is_merch = 1
        AND a.currency_code = 'USD'
        AND a.line_item_amt_type_cd = 'S')

where txn_date >= TO_DATE('10-02-2019', 'DD-MM-YYYY') 
    AND txn_date < TO_DATE('17-02-2019', 'DD-MM-YYYY')
    AND min_txn_date <TO_DATE('10-02-2019', 'DD-MM-YYYY')


group by gender, 
case when age < 18 then '<18'
when age between 18 and 24 then '18-24'
when age between 25 and 32 then '25-32'
when age between 33 and 39 then '35-39'
when age between 40 and 46 then '40-46'
when age between 47 and 53 then '46-52'
when age between 54 and 60 then '53-58'
when age > 60 then '61+' end

Upvotes: 0

Views: 92

Answers (1)

bruceskyaus
bruceskyaus

Reputation: 783

Make sure you are comparing whole dates in your WHERE clauses. If a time value creeps into the data, you may get inconsistent results.

Update your WHERE clauses for new customers to be:

where TRUNC(txn_date) >= TO_DATE('10-02-2019', 'DD-MM-YYYY')
    AND TRUNC(txn_date) < TO_DATE('17-02-2019', 'DD-MM-YYYY')
    AND TRUNC(min_txn_date) >= TO_DATE('10-02-2019', 'DD-MM-YYYY')
    AND TRUNC(min_txn_date) < TO_DATE('17-02-2019', 'DD-MM-YYYY')

And your returning customers to be:

where TRUNC(txn_date) >= TO_DATE('10-02-2019', 'DD-MM-YYYY')
    AND TRUNC(txn_date) < TO_DATE('17-02-2019', 'DD-MM-YYYY')
    AND TRUNC(min_txn_date) <TO_DATE('10-02-2019', 'DD-MM-YYYY')

I also recommend to refactor your queries, so that you don't mix the row-based code with set-based code. In other words, do your row-by-row processing in subqueries/WITH statements, then your aggregation afterwards. This will make the SQL much easier to understand and maintain.

Example 1 - New Customers

SELECT
t.gender,
t.AgeGroup,
count(distinct t.individual_id) as indiv,
count(distinct t.transaction_number) as txn_count,
sum(t.dollar_value_us) as Spend,
sum(t.quantity) as Qty
from (
        SELECT
        a.individual_id,
        a.dollar_value_us,
        a.txn_date,
        a.quantity,
        a.transaction_number,
        b.gender,
        b.age,
        case
            when b.age < 18 then '<18'
            when b.age between 18 and 24 then '18-24'
            when b.age between 25 and 32 then '25-32'
            when b.age between 33 and 39 then '35-39'
            when b.age between 40 and 46 then '40-46'
            when b.age between 47 and 53 then '46-52'
            when b.age between 54 and 60 then '53-58'
            when b.age > 60 then '61+'
        end as AgeGroup,
        MIN(a.txn_date) OVER (PARTITION BY a.individual_id) as min_txn_date
        FROM transaction_detail_mv a
        inner join gender_details b on b.individual_id = a.individual_id
        WHERE a.brand_org_code = 'BRAND'
        AND a.is_merch = 1
        AND a.currency_code = 'USD'
        AND a.line_item_amt_type_cd = 'S'
    ) t
where TRUNC(t.txn_date) >= TO_DATE('10-02-2019', 'DD-MM-YYYY')
AND TRUNC(t.txn_date) < TO_DATE('17-02-2019', 'DD-MM-YYYY')
AND TRUNC(t.min_txn_date) >= TO_DATE('10-02-2019', 'DD-MM-YYYY')
AND TRUNC(t.min_txn_date) < TO_DATE('17-02-2019', 'DD-MM-YYYY')
group by t.gender,
         t.AgeGroup;

Example 2 - Returning Customers

select
t.gender,
t.AgeGroup,
count(distinct individual_id) as indiv,
count (distinct transaction_number) as txn_count,
sum(dollar_value_us) as Spend,
sum(quantity) as Qty
from (
        SELECT
        a.individual_id,
        a.dollar_value_us,
        a.txn_date,
        a.quantity,
        a.transaction_number,
        b.gender,
        b.age,
        case
            when b.age < 18 then '<18'
            when b.age between 18 and 24 then '18-24'
            when b.age between 25 and 32 then '25-32'
            when b.age between 33 and 39 then '35-39'
            when b.age between 40 and 46 then '40-46'
            when b.age between 47 and 53 then '46-52'
            when b.age between 54 and 60 then '53-58'
            when b.age > 60 then '61+'
        end as AgeGroup,
        MIN(txn_date) OVER(PARTITION BY a.individual_id) as min_txn_date
        FROM transaction_detail_mv a
        inner join gender_details b on b.individual_id = a.individual_id
        WHERE a.brand_org_code = 'BRAND'
        AND a.is_merch = 1
        AND a.currency_code = 'USD'
        AND a.line_item_amt_type_cd = 'S'
    ) t
where TRUNC(t.txn_date) >= TO_DATE('10-02-2019', 'DD-MM-YYYY')
AND TRUNC(t.txn_date) < TO_DATE('17-02-2019', 'DD-MM-YYYY')
AND TRUNC(t.min_txn_date) <TO_DATE('10-02-2019', 'DD-MM-YYYY')
group by t.gender,
         t.AgeGroup;

Upvotes: 2

Related Questions