bfritz
bfritz

Reputation: 2422

SQL Nearest Neighbor Query (Movie Recommendation Algorithm)

Need help making this (sort of) working query more dynamic.

I have three tables myShows, TVShows and Users

Would like to take this query and change it to a stored procedure that I can send a User ID into and have it do the rest...

SELECT showId, name, Count(1) AS no_users
FROM
    myShows LEFT OUTER JOIN
              tvshows ON myShows.Show = tvshows.ShowId
WHERE
    [user] IN (
        SELECT [user]
           FROM
               myShows
           WHERE
               show ='1' or show='4'
          )
    AND
    show <> '1' and show <> '4'
GROUP BY
    showId, name
ORDER BY
    no_users DESC

This right now works. But as you can see the problem lies within the WHERE (show ='1' or show='4') and the AND (show <> '1' and show <> '4') statements which is currently hard-coded values, and that's what I need to be dynamic, being I have no idea if the user has 3 or 30 shows I need to check against.

Also how inefficient is this process? this will be used for a iPad application that might get a lot of users. I currently run a movie API (IMDbAPI.com) that gets about 130k hits an hour and had to do a lot of database/code optimization to make it run fast. Thanks again!

If you want the database schema for testing let me know.

Upvotes: 2

Views: 1015

Answers (1)

podiluska
podiluska

Reputation: 51494

This will meet your requirements

select name, count(distinct [user]) from myshows recommend
inner join tvshows on recommend.show = tvshows.showid
where [user] in 
(   
    select other.[user] from 
        ( select show from myshows where [User] = @user ) my,
        ( select show, [user] from myshows where [user] <> @user ) other
    where my.show = other.show
)   
and show not in ( select show from myshows where [User] = @user ) 
group by name
order by count(distinct [user]) desc

If your SQL platform supports WITH Common Table Expressions, the above can be optimized to use them.

Will it be efficient as the data sizes increase? No. Will it be effective? No. If just one user shares a show with your selected user, and they watch a popular show, then that popular show will rise to the top of the ranking.

I'd recommend

a) reviewing your thinking of what recommends a show

b) periodically calculating the results rather than performing it on demand.

Upvotes: 1

Related Questions