Reputation: 55
**Hello, I am trying to find the number of returning and new customers with in a given period of time. I have two queries one is the for just finding the new and returning customers and the other one is the same but is dividing the data into age range and their gender. Technically both of the queries should give me the same totals, but they have different totals. Below are the query, can someone please explain to me what the problem is I am really struggling with it.
Also the total matches in the sample database which I have created but does not match in my actual database.**
I also have a sample database which you can refer in case you need sample data.
https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=be7e1aec30f03edeb0cce246ff05721f
Below is the query for just getting the new and returning customers:
new customer
SELECT
DECODE(is_new, 1, 'New Customers', 'Returning Customers') type_of_customer,
COUNT(distinct individual_id) count_of_customers,
SUM(count_of_transactions) count_of_transactions,
SUM(sum_of_quantity) sum_of_quantity
FROM (
SELECT
individual_id,
SUM(dollar_value_us),
sum(quantity) sum_of_quantity,
count(distinct transaction_number) count_of_transactions,
CASE WHEN MIN(txn_date) = min_txn_date THEN 1 ELSE 0 END is_new
FROM (
SELECT
individual_id,
dollar_value_us,
txn_date,
quantity,
transaction_number,
MIN(txn_date) OVER(PARTITION BY individual_id) min_txn_date
FROM transaction_detail_mv
WHERE
brand_org_code = 'BRAND'
AND is_merch = 1
AND currency_code = 'USD'
AND line_item_amt_type_cd = 'S'
)
WHERE
txn_date >= TO_DATE('10-02-2019', 'DD-MM-YYYY')
AND txn_date < TO_DATE('17-02-2019', 'DD-MM-YYYY')
GROUP BY
individual_id,
min_txn_date
)
x GROUP BY is_new
below is the query which is dividing the same data above in their age range and gender but I have two separate queries for new and returning customers.
for new customer the query is :
select gender,
case when age < 18 then '<18'
when age between 18 and 24 then '18-24'
when age between 25 and 32 then '25-32'
when age between 33 and 39 then '35-39'
when age between 40 and 46 then '40-46'
when age between 47 and 53 then '46-52'
when age between 54 and 60 then '53-58'
when age > 60 then '61+' end as AgeGroup
, count(distinct individual_id) indiv
, count (distinct transaction_number) txn_count
, sum(dollar_value_us) as Spend
, sum(quantity) Qty
from (SELECT
a.individual_id,
a.dollar_value_us,
a.txn_date,
a.quantity,
a.transaction_number,
b.gender,
b.age,
MIN(txn_date) OVER(PARTITION BY a.individual_id) min_txn_date
FROM transaction_detail_mv a
join gender_details b on a.individual_id = b.individual_id
WHERE
a.brand_org_code = 'BRAND'
AND a.is_merch = 1
AND a.currency_code = 'USD'
AND a.line_item_amt_type_cd = 'S')
where txn_date >= TO_DATE('10-02-2019', 'DD-MM-YYYY')
AND txn_date < TO_DATE('17-02-2019', 'DD-MM-YYYY')
AND min_txn_date >= TO_DATE('10-02-2019', 'DD-MM-YYYY')
AND min_txn_date < TO_DATE('17-02-2019', 'DD-MM-YYYY')
group by gender,
case when age < 18 then '<18'
when age between 18 and 24 then '18-24'
when age between 25 and 32 then '25-32'
when age between 33 and 39 then '35-39'
when age between 40 and 46 then '40-46'
when age between 47 and 53 then '46-52'
when age between 54 and 60 then '53-58'
when age > 60 then '61+' end
for returning customers:
select gender,
case when age < 18 then '<18'
when age between 18 and 24 then '18-24'
when age between 25 and 32 then '25-32'
when age between 33 and 39 then '35-39'
when age between 40 and 46 then '40-46'
when age between 47 and 53 then '46-52'
when age between 54 and 60 then '53-58'
when age > 60 then '61+' end as AgeGroup
, count(distinct individual_id) indiv
, count (distinct transaction_number) txn_count
, sum(dollar_value_us) as Spend
, sum(quantity) Qty
from (SELECT
a.individual_id,
a.dollar_value_us,
a.txn_date,
a.quantity,
a.transaction_number,
b.gender,
b.age,
MIN(txn_date) OVER(PARTITION BY a.individual_id) min_txn_date
FROM transaction_detail_mv a
join gender_details b on a.individual_id = b.individual_id
WHERE
a.brand_org_code = 'BRAND'
AND a.is_merch = 1
AND a.currency_code = 'USD'
AND a.line_item_amt_type_cd = 'S')
where txn_date >= TO_DATE('10-02-2019', 'DD-MM-YYYY')
AND txn_date < TO_DATE('17-02-2019', 'DD-MM-YYYY')
AND min_txn_date <TO_DATE('10-02-2019', 'DD-MM-YYYY')
group by gender,
case when age < 18 then '<18'
when age between 18 and 24 then '18-24'
when age between 25 and 32 then '25-32'
when age between 33 and 39 then '35-39'
when age between 40 and 46 then '40-46'
when age between 47 and 53 then '46-52'
when age between 54 and 60 then '53-58'
when age > 60 then '61+' end
Upvotes: 0
Views: 92
Reputation: 783
Make sure you are comparing whole dates in your WHERE
clauses. If a time value creeps into the data, you may get inconsistent results.
Update your WHERE
clauses for new customers to be:
where TRUNC(txn_date) >= TO_DATE('10-02-2019', 'DD-MM-YYYY')
AND TRUNC(txn_date) < TO_DATE('17-02-2019', 'DD-MM-YYYY')
AND TRUNC(min_txn_date) >= TO_DATE('10-02-2019', 'DD-MM-YYYY')
AND TRUNC(min_txn_date) < TO_DATE('17-02-2019', 'DD-MM-YYYY')
And your returning customers to be:
where TRUNC(txn_date) >= TO_DATE('10-02-2019', 'DD-MM-YYYY')
AND TRUNC(txn_date) < TO_DATE('17-02-2019', 'DD-MM-YYYY')
AND TRUNC(min_txn_date) <TO_DATE('10-02-2019', 'DD-MM-YYYY')
I also recommend to refactor your queries, so that you don't mix the row-based code with set-based code. In other words, do your row-by-row processing in subqueries/WITH statements, then your aggregation afterwards. This will make the SQL much easier to understand and maintain.
Example 1 - New Customers
SELECT
t.gender,
t.AgeGroup,
count(distinct t.individual_id) as indiv,
count(distinct t.transaction_number) as txn_count,
sum(t.dollar_value_us) as Spend,
sum(t.quantity) as Qty
from (
SELECT
a.individual_id,
a.dollar_value_us,
a.txn_date,
a.quantity,
a.transaction_number,
b.gender,
b.age,
case
when b.age < 18 then '<18'
when b.age between 18 and 24 then '18-24'
when b.age between 25 and 32 then '25-32'
when b.age between 33 and 39 then '35-39'
when b.age between 40 and 46 then '40-46'
when b.age between 47 and 53 then '46-52'
when b.age between 54 and 60 then '53-58'
when b.age > 60 then '61+'
end as AgeGroup,
MIN(a.txn_date) OVER (PARTITION BY a.individual_id) as min_txn_date
FROM transaction_detail_mv a
inner join gender_details b on b.individual_id = a.individual_id
WHERE a.brand_org_code = 'BRAND'
AND a.is_merch = 1
AND a.currency_code = 'USD'
AND a.line_item_amt_type_cd = 'S'
) t
where TRUNC(t.txn_date) >= TO_DATE('10-02-2019', 'DD-MM-YYYY')
AND TRUNC(t.txn_date) < TO_DATE('17-02-2019', 'DD-MM-YYYY')
AND TRUNC(t.min_txn_date) >= TO_DATE('10-02-2019', 'DD-MM-YYYY')
AND TRUNC(t.min_txn_date) < TO_DATE('17-02-2019', 'DD-MM-YYYY')
group by t.gender,
t.AgeGroup;
Example 2 - Returning Customers
select
t.gender,
t.AgeGroup,
count(distinct individual_id) as indiv,
count (distinct transaction_number) as txn_count,
sum(dollar_value_us) as Spend,
sum(quantity) as Qty
from (
SELECT
a.individual_id,
a.dollar_value_us,
a.txn_date,
a.quantity,
a.transaction_number,
b.gender,
b.age,
case
when b.age < 18 then '<18'
when b.age between 18 and 24 then '18-24'
when b.age between 25 and 32 then '25-32'
when b.age between 33 and 39 then '35-39'
when b.age between 40 and 46 then '40-46'
when b.age between 47 and 53 then '46-52'
when b.age between 54 and 60 then '53-58'
when b.age > 60 then '61+'
end as AgeGroup,
MIN(txn_date) OVER(PARTITION BY a.individual_id) as min_txn_date
FROM transaction_detail_mv a
inner join gender_details b on b.individual_id = a.individual_id
WHERE a.brand_org_code = 'BRAND'
AND a.is_merch = 1
AND a.currency_code = 'USD'
AND a.line_item_amt_type_cd = 'S'
) t
where TRUNC(t.txn_date) >= TO_DATE('10-02-2019', 'DD-MM-YYYY')
AND TRUNC(t.txn_date) < TO_DATE('17-02-2019', 'DD-MM-YYYY')
AND TRUNC(t.min_txn_date) <TO_DATE('10-02-2019', 'DD-MM-YYYY')
group by t.gender,
t.AgeGroup;
Upvotes: 2