AKor
AKor

Reputation: 8882

Constructing a simple recommendation engine

Both users and pages on my website have IDs. When a user goes on a certain page, their userID and the pageID will be written to a MySQL table as such:

 userID | pageID
    3   |    1
    2   |    1
    3   |    2
       etc...

In this table, called user_pages, I would end up with a bunch of raw data that can be turned into a recommendation engine. What I mean by recommendation engine - I want to analyze historical data, and be able to predict, based on a set of viewed pages, the next pages that a user may like. Let's say there is a strong correlation between visiting page with ID 3 after going to pages with IDs 4, 9, 15. If a user goes on pages 4, 9, and 15, then the engine should recommend page 3.

I think I have all of the data input code necessary for creating this. How would I write something that analyzes the data for correlation of pages (i.e. almost everyone who visited page 5 visited page 1 also), and somehow use that to predict in the future the pages that a user may end up liking?

Upvotes: 2

Views: 4260

Answers (2)

GordyD
GordyD

Reputation: 5103

Recommendation systems are a big part of A.I research. I believe you are interested in a collection of algorithms called collaborative filtering. Since the netflix prize in 2007 this field has developed greatly. I would recommend going here and having a read. It explains the basic concepts of recommender systems in a succinct and clear way and also provides a link to Java source code for an approach to the Netflix project, MemReader. You could examine this source code and extrapolate the basic algorithms for building a recommendation engine.

Alternatively if you want a more mathematical explanation of the algorithms employed go here.

It shouldn't take too long to implement at all.

Upvotes: 8

stevecomrie
stevecomrie

Reputation: 2483

This post posed a similar question: Advanced MySQL: Find correlations between poll responses

I think you would be able to generate a similar response if your primary data table had one additional field in it, specifically the id of the page the used last visited or visited immediately following.

Something like this:

+------+----------+--------------+----------+
| id   | page_id  | next_page_id | user_id  |
+------+----------+--------------+----------+
|    1 | 1        | 1            | 1        |
|    2 | 1        | 2            | 2        |
|    3 | 1        | 2            | 3        |
|    4 | 1        | 2            | 4        |
|    5 | 2        | 3            | 1        |
|    6 | 2        | 3            | 2        |
|    7 | 2        | 3            | 3        |
|    8 | 2        | 4            | 4        |
|    9 | 3        | 5            | 1        |
+------+----------+--------------+----------+

Then you should be able to use a modified version of one of the SQL queries suggested there to generate a list of high-correlation recommendations between the current page and the next page.

Upvotes: 1

Related Questions