Reputation: 115
I'm having a table with an user-ID and the Webside-IDs he navigated through. The table looks like this:
| user-ID | website-ID |
| 1 | 1 |
| 1 | 2 |
| 1 | 1 |
| 1 | 4 |
| 2 | 1 |
| 2 | 2 |
| 2 | 5 |
| 2 | 2 |
I'd like to select the website-ID each user has been into the most. As you can see, a min, max or avg won't work here. Any ideas?
Upvotes: 2
Views: 1654
Reputation: 753930
Work it out, piece-meal - step by step.
Note that the data shown does not have a primary key; there are repeated rows. Presumably the full data set has other information such as time of visit that makes the rows unique. It doesn't matter too much for the question at hand, but you aim to have tables with primary keys whenever possible (which is essentially always).
Note too that the identifiers 'user-ID' and 'website-ID' are not valid in standard SQL unless treated as delimited identifier; for the purposes at hand, the dashes are changed to underscores.
This treatment uses basic SQL. It does not use ROLAP extensions, and it does not use WITH clauses that give names to common expressions.
SELECT User_ID, Website_ID, COUNT(*) AS NumberOfVisits
FROM WebSitesVisited
GROUP BY User_ID, Website_ID;
SELECT User_ID, MAX(NumberOfVisits) AS MaxNumberOfVisits
FROM (SELECT User_ID, Website_ID, COUNT(*) AS NumberOfVisits
FROM WebSitesVisited
GROUP BY User_ID, Website_ID
) AS V
GROUP BY User_ID;
SELECT V.User_ID, V.Website_ID, V.NumberOfVisits
FROM (SELECT User_ID, MAX(NumberOfVisits) AS MaxNumberOfVisits
FROM (SELECT User_ID, Website_ID, COUNT(*) AS NumberOfVisits
FROM WebSitesVisited
GROUP BY User_ID, Website_ID
) AS V2
GROUP BY User_ID
) AS M
JOIN (SELECT User_ID, Website_ID, COUNT(*) AS NumberOfVisits
FROM WebSitesVisited
GROUP BY User_ID, Website_ID
) AS V
ON M.User_ID = V.User_ID AND M.NumberOfVisits = V.MaxNumberOfVisits
ORDER BY V.User_ID;
Note that if user X visited both site Y and site Z the same number of times (and that number of times was the most that user X visited any site) then both Y and Z will appear in the list for user X.
Upvotes: 0
Reputation: 107716
This may look complicated but it will only aggregate the data once only, then pass through it ranking the results and choosing only the first one
select userid, websiteid, visits
from
(
select
userid, websiteid, visits,
@r := case when @u=userid then @r+1 else 1 end r,
@u := userid
from
(select @u:=null) x,
(select userid, websiteid, count(*) visits
from visit
group by userid, websiteid
order by userid, visits desc) y
) z
where r=1
The inner select generates counts for each user-website combination and orders it by the most visits. This is then passed through the middle query that ranks the records per user
, giving the rank in the column r
.
This is a variation, which will show ALL websites for a user that have EQUAL rank. The difference with the previous query is that if websites A and B both have 10 visits from user X, both A and B are listed in the result whereas the previous query selects one randomly to show.
select userid, websiteid, visits
from
(
select
userid, websiteid, visits,
@r := case
when @u=userid and @v=visits then @r # same rank
when @u=userid then @r+1 # next rank
else 1 # different user
end r,
@u := userid, @v := visits
from
(select @u:=null, @v:=null) x,
(select userid, websiteid, count(*) visits
from visit
group by userid, websiteid
order by userid, visits desc) y
) z
where r=1
This is the test table used
create table visit (userid int, websiteid int);
insert into visit select 1,1;
insert into visit select 1,2;
insert into visit select 1,1;
insert into visit select 1,4;
insert into visit select 2,1;
insert into visit select 2,2;
insert into visit select 2,5;
insert into visit select 2,2;
Upvotes: 0
Reputation: 135808
I've introduced a temporary table just to make things a little easier to read. You could certainly substitute that query inline and eliminate the temp table if desired.
The first select sums the visits per user and website.
The second select finds the maximum visits for each user in the subquery and then joins back to the temp table to find the webiste whose count matches that maximum value.
create temporary table TempSum
select user-ID, website-ID, count(*) as TotalCount
from YourTable
group by user-ID, website-ID
select ts.user-ID, ts.website-ID, ts.TotalCount
from (select user-ID, max(TotalCount) as MaxCount
from TempSum
group by user-ID) ms
inner join TempSum ts
on ts.user-ID = ms.user-ID
and ts.TotalCount = ms.MaxCount
Upvotes: 3
Reputation: 1514
I am not sure I get your question right. If you would like to get the website-ID with the most visits, you can use this:
select count(user-ID) as cnt from dataTable group by website-ID order by cnt desc
Upvotes: 0