Abs
Abs

Reputation: 57916

SQL Query to find rows that didn't occur this month

I am trying to find the number of sellers that made a sale last month but didn't make a sale this month.

I have a query that works but I don't think its efficient and I haven't figured out how to do this for all months.

SELECT count(distinct user_id) as users
FROM transactions
WHERE MONTH(date) = 12
AND YEAR(date) = 2015
AND transactions.status = 'COMPLETED'
AND transactions.amount > 0
AND transactions.user_id NOT IN 
(
    SELECT distinct user_id
    FROM transactions
    WHERE MONTH(date) = 1
    AND YEAR(date) = 2016
    AND transactions.status = 'COMPLETED'
    AND transactions.amount > 0
)

The structure of the table is:

+---------+------------+-------------+--------+
| user_id |    date    |   status    | amount |
+---------+------------+-------------+--------+
|       1 | 2016-01-01 | 'COMPLETED' | 1.00   |
|       2 | 2015-12-01 | 'COMPLETED' | 1.00   |
|       3 | 2015-12-01 | 'COMPLETED' | 2.00   |
|       1 | 2015-12-01 | 'COMPLETED' | 3.00   |
+---------+------------+-------------+--------+

So in this case, users with ID 2 and 3, didn't make a sale this month.

Upvotes: 4

Views: 869

Answers (4)

Andreas Rayo Kniep
Andreas Rayo Kniep

Reputation: 6532

Just some input for thought:
You could create aggregated lists of user-IDs per month, representing all the unique buyers in that month. In your application, you would then simply have to subtract the two months in question in order to get all user-IDs that have only made a sale in one of the two months.
See below for query- and post-processing-examples.

In order to make your query efficient, I would recommend at least a 2-column index for table transactions on [status, amount]. However, in order to prevent the query from having to look up data in the actual table, you could even create a 4-column index [status, amount, date, user_id], which should further improve the performance of your query.

Postgres (v9.0+, tested)

SELECT   (DATE_PART('year', t.date) || '-' || DATE_PART('month', t.date)) AS d,
         STRING_AGG( DISTINCT t.user_id::TEXT, ',' ) AS buyers
FROM     transactions t
WHERE    t.status = 'COMPLETED'
AND      t.amount > 0
GROUP BY DATE_PART('year', t.date),
         DATE_PART('month', t.date)
ORDER BY DATE_PART('year', t.date),
         DATE_PART('month', t.date)
;

MySQL (not tested)

SELECT   (YEAR(t.date) || '-' || MONTH(t.date)) AS d,
         GROUP_CONCAT( DISTINCT t.user_id ) AS buyers
FROM     transactions t
WHERE    t.status = 'COMPLETED'
AND      t.amount > 0
GROUP BY YEAR(t.date), MONTH(t.date)
ORDER BY YEAR(t.date), MONTH(t.date)
;

Ruby (example for post-processing)

db_result                   = ActiveRecord::Base.connection_pool.with_connection { |con| con.execute( db_query ) }
unique_buyers               = db_result.map{|e|[e['d'],e['buyers'].split(',')]}.to_h

buyers_dec15_but_not_jan16  = unique_buyers['2015-12'] - unique_buyers['2016-1']
buyers_nov15_but_not_dec16  = unique_buyers['2015-11']||[] - unique_buyers['2015-12']
...(and so on)...

Upvotes: 0

dnoeth
dnoeth

Reputation: 60462

Use conditional aggregation:

SELECT count(*) as users
FROM
 (
    SELECT user_id
    FROM transactions
                       -- 1st of previous month
    WHERE date BETWEEN SUBDATE(SUBDATE(CURRENT_DATE, DAYOFMONTH(CURRENT_DATE)-1), interval 1 month) 
                       -- end of current month
                   AND LAST_DAY(CURRENT_DATE)
    AND transactions.status = 'COMPLETED'
    AND transactions.amount > 0
    GROUP BY user_id
           -- any row from previous month
    HAVING MAX(CASE WHEN date < SUBDATE(CURRENT_DATE, DAYOFMONTH(CURRENT_DATE)-1)
                    THEN date 
               END) IS NOT NULL
           -- no row in current month
       AND MAX(CASE WHEN date >= SUBDATE(CURRENT_DATE, DAYOFMONTH(CURRENT_DATE)-1)
                    THEN date 
               END) IS NULL           
 ) AS dt

SUBDATE(CURRENT_DATE, DAYOFMONTH(CURRENT_DATE)-1) = first day of current month

SUBDATE(first day of current month, interval 1 month) = first day of previous month

LAST_DAY(CURRENT_DATE) = end of current month

Upvotes: 3

Strawberry
Strawberry

Reputation: 33935

The following should be pretty efficient. In order to make it even more so, you would need to provide the table definition and and the EXPLAIN.

SELECT COUNT(DISTINCT user_id) users
  FROM transactions t
  LEFT 
  JOIN transactions x
    ON x.user_id = t.user_id
   AND x.date BETWEEN '2016-01-01' AND '2016-01-31'
   AND x.status = 'COMPLETED'
   AND x.amount > 0
 WHERE t.date BETWEEN '2015-12-01' AND '2015-12-31'
   AND t.status = 'COMPLETED'
   AND t.amount > 0
   AND x.user_id IS NULL;

Upvotes: 1

Nir Levy
Nir Levy

Reputation: 12953

if you want to generify it, you can use curdate() to get current month, and DATE_SUB(curdate(), INTERVAL 1 MONTH) to get last month (you will need to do some if clause for January/December though):

SELECT count(distinct user_id) as users
FROM transactions
WHERE MONTH(date) = MONTH(DATE_SUB(curdate(), INTERVAL 1 MONTH))
AND transactions.status = 'COMPLETED'
AND transactions.amount > 0
AND transactions.user_id NOT IN 
(
    SELECT distinct user_id
    FROM transactions
    WHERE MONTH(date) = MONTH(curdate())
    AND transactions.status = 'COMPLETED'
    AND transactions.amount > 0
)

as far as efficiency goes, I don't see a problem with this one

Upvotes: 1

Related Questions