Ali Ahmad
Ali Ahmad

Reputation: 1055

How to categorize customer through MYSQL query?

I am finding difficulty in writing mysql query to categorize my customers. I am categorizing customers based on number of hits on my website.like

  1. New customer with one hits.
  2. New customer with multiple hits.
  3. Old customer

My Log table schema is as follows

Unique customer ID, Current Date, Subscribed, Hits Count

To categorize customer how can I compare current date customer logs with all the previous date logs through single query

Upvotes: 0

Views: 309

Answers (1)

spencer7593
spencer7593

Reputation: 108510

It's not clear from your description, is customer_id unique?

Or is it the tuple (customer_id,current_date,subscribed,hits_count) that is unique?

If customer_id is unique, then something like this will return the specified result:

SELECT t.customer_id
     , CASE
       WHEN t.hits_count = 1 AND t.current_date = DATE(NOW())
       THEN 'New customer with one hits.'
       WHEN t.hits_count > 1 AND t.current_date = DATE(NOW())
       THEN 'New customer with multiple hits.'
       ELSE 'Old customer'
       END AS category
  FROM mytable t

If customer_id is not unique, then one way (but not the most efficient way) to get the specified result:

SELECT t.customer_id
     , CASE
       WHEN t.total_hits_count = 1 AND t.min_current_date = DATE(NOW())
       THEN 'New customer with one hits.'
       WHEN t.total_hits_count > 1 AND t.min_current_date = DATE(NOW())
       THEN 'New customer with multiple hits.'
       ELSE 'Old customer'
       END AS category
  FROM ( SELECT h.customer_id
              , MIN(h.current_date) AS min_current_date
              , SUM(h.hits_count) AS total_hits_count
           FROM mytable h
          GROUP BY h.customer_id
       ) t

The inline view aliased as t gets us unique values for customer_id, along with the earliest current_date, and the total of the hits_count. (You can run just the query inside the parens to verify it's returning the desired result.) The outer query is identical to the first query, with just some renamed columns.

The inline view isn't necessary, you could get an equivalent result (more efficiently) with something like this:

SELECT t.customer_id
     , CASE
       WHEN SUM(t.hits_count) = 1 AND MIN(t.current_date) = DATE(NOW())
       THEN 'New customer with one hits.'
       WHEN SUM(t.hits_count) > 1 AND MIN(t.current_date) = DATE(NOW())
       THEN 'New customer with multiple hits.'
       ELSE 'Old customer'
       END AS category
  FROM mytable t
 GROUP BY t.customer_id

NOTE There's some corner cases that will cause customer_id to be categorized as 'Old customer', such as SUM(t.hits_count) < 1, or t.current_date IS NULL, etc.

To specifically test for a row with a current_date before today's date, make a specific test for that in the CASE expression:

SELECT t.customer_id
     , CASE
       WHEN SUM(t.hits_count) = 1 AND MIN(t.current_date) = DATE(NOW())
       THEN 'New customer with one hits.'
       WHEN SUM(t.hits_count) > 1 AND MIN(t.current_date) = DATE(NOW())
       THEN 'New customer with multiple hits.'
       WHEN MIN(t.current_date) < DATE(NOW())
       THEN 'Old customer'
       ELSE 'Some other category'
       END AS category
  FROM mytable t
 GROUP BY t.customer_id

NOTE

I assumed that the current_date column was of type DATE, and not DATETIME or TIMESTAMP. If that column also includes a time component which is not equal to midnight 00:00:00, then the equality comparison to DATE(NOW()) is not going to return TRUE whenever that time component is not midnight.

In that case, we'd prefer to check a range of datetime values, replacing

... AND t.current_date = DATE(NOW()) 

with something like this:

... AND t.current_date >= DATE(NOW()) AND t.current_date < DATE(NOW()) + INTERVAL 1 DAY

Upvotes: 1

Related Questions