ahs85
ahs85

Reputation: 115

Highest appearance of a value within each group

I'm having a table with an user-ID and the Webside-IDs he navigated through. The table looks like this:

| user-ID | website-ID |
|       1 |          1 |
|       1 |          2 |
|       1 |          1 |
|       1 |          4 |
|       2 |          1 |
|       2 |          2 |
|       2 |          5 |
|       2 |          2 |

I'd like to select the website-ID each user has been into the most. As you can see, a min, max or avg won't work here. Any ideas?

Upvotes: 2

Views: 1654

Answers (4)

Jonathan Leffler
Jonathan Leffler

Reputation: 753930

Work it out, piece-meal - step by step.

Note that the data shown does not have a primary key; there are repeated rows. Presumably the full data set has other information such as time of visit that makes the rows unique. It doesn't matter too much for the question at hand, but you aim to have tables with primary keys whenever possible (which is essentially always).

Note too that the identifiers 'user-ID' and 'website-ID' are not valid in standard SQL unless treated as delimited identifier; for the purposes at hand, the dashes are changed to underscores.

This treatment uses basic SQL. It does not use ROLAP extensions, and it does not use WITH clauses that give names to common expressions.

How many times has each user been to each web site?

SELECT User_ID, Website_ID, COUNT(*) AS NumberOfVisits
  FROM WebSitesVisited
 GROUP BY User_ID, Website_ID;

Which is the maximum number of times a user visited any web site?

SELECT User_ID, MAX(NumberOfVisits) AS MaxNumberOfVisits
  FROM (SELECT User_ID, Website_ID, COUNT(*) AS NumberOfVisits
          FROM WebSitesVisited
         GROUP BY User_ID, Website_ID
       ) AS V
 GROUP BY User_ID;

Which web sites did a given user visit the most times?

SELECT V.User_ID, V.Website_ID, V.NumberOfVisits
  FROM (SELECT User_ID, MAX(NumberOfVisits) AS MaxNumberOfVisits
          FROM (SELECT User_ID, Website_ID, COUNT(*) AS NumberOfVisits
                  FROM WebSitesVisited
                 GROUP BY User_ID, Website_ID
               ) AS V2
         GROUP BY User_ID
       ) AS M
  JOIN (SELECT User_ID, Website_ID, COUNT(*) AS NumberOfVisits
          FROM WebSitesVisited
         GROUP BY User_ID, Website_ID
       ) AS V
    ON M.User_ID = V.User_ID AND M.NumberOfVisits = V.MaxNumberOfVisits
 ORDER BY V.User_ID;

Note that if user X visited both site Y and site Z the same number of times (and that number of times was the most that user X visited any site) then both Y and Z will appear in the list for user X.

Upvotes: 0

RichardTheKiwi
RichardTheKiwi

Reputation: 107716

This may look complicated but it will only aggregate the data once only, then pass through it ranking the results and choosing only the first one

select userid, websiteid, visits
from
(
select 
 userid, websiteid, visits,
 @r := case when @u=userid then @r+1 else 1 end r,
 @u := userid
from
(select @u:=null) x,
(select userid, websiteid, count(*) visits
 from visit
 group by userid, websiteid
 order by userid, visits desc) y
) z
where r=1

The inner select generates counts for each user-website combination and orders it by the most visits. This is then passed through the middle query that ranks the records per user, giving the rank in the column r.

This is a variation, which will show ALL websites for a user that have EQUAL rank. The difference with the previous query is that if websites A and B both have 10 visits from user X, both A and B are listed in the result whereas the previous query selects one randomly to show.

select userid, websiteid, visits
from
(
select 
 userid, websiteid, visits,
 @r := case
    when @u=userid and @v=visits then @r  # same rank
    when @u=userid then @r+1              # next rank
    else 1                                # different user
    end r,
 @u := userid, @v := visits
from
(select @u:=null, @v:=null) x,
(select userid, websiteid, count(*) visits
 from visit
 group by userid, websiteid
 order by userid, visits desc) y
) z
where r=1

This is the test table used

create table visit (userid int, websiteid int);
insert into visit select 1,1;
insert into visit select 1,2;
insert into visit select 1,1;
insert into visit select 1,4;
insert into visit select 2,1;
insert into visit select 2,2;
insert into visit select 2,5;
insert into visit select 2,2;

Upvotes: 0

Joe Stefanelli
Joe Stefanelli

Reputation: 135808

I've introduced a temporary table just to make things a little easier to read. You could certainly substitute that query inline and eliminate the temp table if desired.

The first select sums the visits per user and website.

The second select finds the maximum visits for each user in the subquery and then joins back to the temp table to find the webiste whose count matches that maximum value.

create temporary table TempSum
    select user-ID, website-ID, count(*) as TotalCount
        from YourTable
        group by user-ID, website-ID

select ts.user-ID, ts.website-ID, ts.TotalCount
    from (select user-ID, max(TotalCount) as MaxCount
              from TempSum
              group by user-ID) ms
        inner join TempSum ts
            on ts.user-ID = ms.user-ID
                and ts.TotalCount = ms.MaxCount

Upvotes: 3

Mano Kovacs
Mano Kovacs

Reputation: 1514

I am not sure I get your question right. If you would like to get the website-ID with the most visits, you can use this:

select count(user-ID) as cnt from dataTable group by website-ID order by cnt desc

Upvotes: 0

Related Questions