Reputation: 57916
I am trying to find the number of sellers that made a sale last month but didn't make a sale this month.
I have a query that works but I don't think its efficient and I haven't figured out how to do this for all months.
SELECT count(distinct user_id) as users
FROM transactions
WHERE MONTH(date) = 12
AND YEAR(date) = 2015
AND transactions.status = 'COMPLETED'
AND transactions.amount > 0
AND transactions.user_id NOT IN
(
SELECT distinct user_id
FROM transactions
WHERE MONTH(date) = 1
AND YEAR(date) = 2016
AND transactions.status = 'COMPLETED'
AND transactions.amount > 0
)
The structure of the table is:
+---------+------------+-------------+--------+
| user_id | date | status | amount |
+---------+------------+-------------+--------+
| 1 | 2016-01-01 | 'COMPLETED' | 1.00 |
| 2 | 2015-12-01 | 'COMPLETED' | 1.00 |
| 3 | 2015-12-01 | 'COMPLETED' | 2.00 |
| 1 | 2015-12-01 | 'COMPLETED' | 3.00 |
+---------+------------+-------------+--------+
So in this case, users with ID 2
and 3
, didn't make a sale this month.
Upvotes: 4
Views: 869
Reputation: 6532
Just some input for thought:
You could create aggregated lists of user-IDs per month, representing all the unique buyers in that month. In your application, you would then simply have to subtract the two months in question in order to get all user-IDs that have only made a sale in one of the two months.
See below for query- and post-processing-examples.
In order to make your query efficient, I would recommend at least a 2-column index for table transactions on [status, amount]
. However, in order to prevent the query from having to look up data in the actual table, you could even create a 4-column index [status, amount, date, user_id]
, which should further improve the performance of your query.
Postgres (v9.0+, tested)
SELECT (DATE_PART('year', t.date) || '-' || DATE_PART('month', t.date)) AS d,
STRING_AGG( DISTINCT t.user_id::TEXT, ',' ) AS buyers
FROM transactions t
WHERE t.status = 'COMPLETED'
AND t.amount > 0
GROUP BY DATE_PART('year', t.date),
DATE_PART('month', t.date)
ORDER BY DATE_PART('year', t.date),
DATE_PART('month', t.date)
;
MySQL (not tested)
SELECT (YEAR(t.date) || '-' || MONTH(t.date)) AS d,
GROUP_CONCAT( DISTINCT t.user_id ) AS buyers
FROM transactions t
WHERE t.status = 'COMPLETED'
AND t.amount > 0
GROUP BY YEAR(t.date), MONTH(t.date)
ORDER BY YEAR(t.date), MONTH(t.date)
;
Ruby (example for post-processing)
db_result = ActiveRecord::Base.connection_pool.with_connection { |con| con.execute( db_query ) }
unique_buyers = db_result.map{|e|[e['d'],e['buyers'].split(',')]}.to_h
buyers_dec15_but_not_jan16 = unique_buyers['2015-12'] - unique_buyers['2016-1']
buyers_nov15_but_not_dec16 = unique_buyers['2015-11']||[] - unique_buyers['2015-12']
...(and so on)...
Upvotes: 0
Reputation: 60462
Use conditional aggregation:
SELECT count(*) as users
FROM
(
SELECT user_id
FROM transactions
-- 1st of previous month
WHERE date BETWEEN SUBDATE(SUBDATE(CURRENT_DATE, DAYOFMONTH(CURRENT_DATE)-1), interval 1 month)
-- end of current month
AND LAST_DAY(CURRENT_DATE)
AND transactions.status = 'COMPLETED'
AND transactions.amount > 0
GROUP BY user_id
-- any row from previous month
HAVING MAX(CASE WHEN date < SUBDATE(CURRENT_DATE, DAYOFMONTH(CURRENT_DATE)-1)
THEN date
END) IS NOT NULL
-- no row in current month
AND MAX(CASE WHEN date >= SUBDATE(CURRENT_DATE, DAYOFMONTH(CURRENT_DATE)-1)
THEN date
END) IS NULL
) AS dt
SUBDATE(CURRENT_DATE, DAYOFMONTH(CURRENT_DATE)-1)
= first day of current month
SUBDATE(first day of current month, interval 1 month)
= first day of previous month
LAST_DAY(CURRENT_DATE)
= end of current month
Upvotes: 3
Reputation: 33935
The following should be pretty efficient. In order to make it even more so, you would need to provide the table definition and and the EXPLAIN.
SELECT COUNT(DISTINCT user_id) users
FROM transactions t
LEFT
JOIN transactions x
ON x.user_id = t.user_id
AND x.date BETWEEN '2016-01-01' AND '2016-01-31'
AND x.status = 'COMPLETED'
AND x.amount > 0
WHERE t.date BETWEEN '2015-12-01' AND '2015-12-31'
AND t.status = 'COMPLETED'
AND t.amount > 0
AND x.user_id IS NULL;
Upvotes: 1
Reputation: 12953
if you want to generify it, you can use curdate()
to get current month, and DATE_SUB(curdate(), INTERVAL 1 MONTH)
to get last month (you will need to do some if clause for January/December though):
SELECT count(distinct user_id) as users
FROM transactions
WHERE MONTH(date) = MONTH(DATE_SUB(curdate(), INTERVAL 1 MONTH))
AND transactions.status = 'COMPLETED'
AND transactions.amount > 0
AND transactions.user_id NOT IN
(
SELECT distinct user_id
FROM transactions
WHERE MONTH(date) = MONTH(curdate())
AND transactions.status = 'COMPLETED'
AND transactions.amount > 0
)
as far as efficiency goes, I don't see a problem with this one
Upvotes: 1