Andreas Bonini
Andreas Bonini

Reputation: 44742

Count number of consecutive visits

Every time a logged in user visits the website their data is put into a table containing the userId and date (either one or zero row per user per day):

   444631 2011-11-07
   444631 2011-11-06
   444631 2011-11-05
   444631 2011-11-04
   444631 2011-11-02
   444631 2011-11-01

I need to have ready access to the number of consecutive visits when I pull the user data from the main user table.. In the case for this user, it would be 4.

Currently I'm doing this through a denormalized consecutivevisits counter in the main user table, however for unknown reasons it sometimes resets.. I want to try an approach that uses exclusively the data in the table above.

What's the best SQL query to get that number (4 in the example above)? There are users who have hundreds of visits, we have millions of registered users and hits per day.

EDIT: As per the comments below I'm posting the code I currently use to do this; it however has the problem that it sometimes resets for no reason and it also reset it for everyone during the weekend, most likely because of the DST change.

// Called every page load for logged in users
public static function OnVisit($user)
{
    $lastVisit = $user->GetLastVisit(); /* Timestamp; db server is on the same timezone as www server */
    if(!$lastVisit)
        $delta = 2;
    else
    {
        $today = date('Y/m/d');

        if(date('Y/m/d', $lastVisit) == $today)
            $delta = 0;
        else if(date('Y/m/d', $lastVisit + (24 * 60 * 60)) == $today)
            $delta = 1;
        else
            $delta = 2;
    }

    if(!$delta)
        return;

    $visits = $user->GetConsecutiveVisits();
    $userId = $user->GetId();

            /* NOTE: t_dailyvisit is the table I pasted above. The table is unused;
             * I added it only to ensure that the counter sometimes really resets
             * even if the user visits the website, and I could confirm that. */
    q_Query("INSERT IGNORE INTO `t_dailyvisit` (`user`, `date`) VALUES ($userId, CURDATE())", DB_DATABASE_COMMON);

    /* User skipped 1 or more days.. */
    if($delta > 1)
        $visits = 1;
    else if($delta == 1)
        $visits += 1;

    q_Query("UPDATE `t_user` SET `consecutivevisits` = $visits, `lastvisit` = CURDATE(), `nvotesday` = 0 WHERE `id` = $userId", DB_DATABASE_COMMON);
    $user->ForceCacheExpire();
}

Upvotes: 1

Views: 1632

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656461

I missed the mysql tag and wrote up this solution. Sadly, this does not work in MySQL as it does not support window functions.

I post it anyway, as I put some effort into it. Tested with PostgreSQL. Would work similarly with Oracle or SQL Server (or any other decent RDBMS that supports window functions).

Test setup

CREATE TEMP TABLE v(id int, visit date);
INSERT INTO v VALUES
 (444631, '2011-11-07')
,(444631, '2011-11-06')
,(444631, '2011-11-05')
,(444631, '2011-11-04')
,(444631, '2011-11-02')
,(444631, '2011-11-01')
,(444632, '2011-12-02')
,(444632, '2011-12-03')
,(444632, '2011-12-05');

Simple version

-- add 1 to "difference" to get number of days of the longest period
SELECT id, max(dur) + 1 as max_consecutive_days
FROM (

   -- calculate date difference of min and max in the group
   SELECT id, grp, max(visit) - min(visit) as dur
   FROM (

      -- consecutive days end up in a group
      SELECT *, sum(step) OVER (ORDER BY id, rn) AS grp
      FROM   (

         -- step up at the start of a new group of days
         SELECT id
               ,row_number() OVER w AS rn
               ,visit
               ,CASE WHEN COALESCE(visit - lag(visit) OVER w, 1) = 1
                THEN 0 ELSE 1 END AS step
         FROM   v
         WINDOW w AS (PARTITION BY id ORDER BY visit)
         ORDER  BY 1,2
         ) x
      ) y
      GROUP BY 1,2
   ) z
GROUP  BY 1
ORDER  BY 1
LIMIT  1;

Output:

   id   | max_consecutive_days
--------+----------------------
 444631 |                    4

Faster / Shorter

I later found an even better way. grp numbers are not continuous (but continuously rising). Doesn't matter, since those are just a mean to an end:

SELECT id, max(dur) + 1 AS max_consecutive_days
FROM (
    SELECT id, grp, max(visit) - min(visit) AS dur
    FROM (
      -- subtract an integer representing the number of day from the row_number()
      -- creates a "group number" (grp) for consecutive days
      SELECT id
            ,EXTRACT(epoch from visit)::int / 86400
           - row_number() OVER (PARTITION BY id ORDER BY visit) AS grp
            ,visit
      FROM   v
      ORDER  BY 1,2
      ) x
    GROUP BY 1,2
    ) y
GROUP  BY 1
ORDER  BY 1
LIMIT  1;

SQL Fiddle for both.

More

Upvotes: 3

Fischermaen
Fischermaen

Reputation: 12458

If it is not necessary to have a log of every day the user was logged on to the webiste and you only want to know the consecutive days he was logged on, I would prefer this way:

Chose 3 columns: LastVisit (Date), ConsecutiveDays (int) and User.

On log-in you check the entry for the user, determine if last visit was "Today - 1", then add 1 to the columns ConsecutiveDays and store "Today" in column LastVisit. If last vist is greater than "Today - 1" then store 1 in ConsecutiveDays.

HTH

Upvotes: 1

Related Questions