Reputation: 8882
Both users and pages on my website have IDs. When a user goes on a certain page, their userID and the pageID will be written to a MySQL table as such:
userID | pageID
3 | 1
2 | 1
3 | 2
etc...
In this table, called user_pages
, I would end up with a bunch of raw data that can be turned into a recommendation engine. What I mean by recommendation engine - I want to analyze historical data, and be able to predict, based on a set of viewed pages, the next pages that a user may like. Let's say there is a strong correlation between visiting page with ID 3 after going to pages with IDs 4, 9, 15. If a user goes on pages 4, 9, and 15, then the engine should recommend page 3.
I think I have all of the data input code necessary for creating this. How would I write something that analyzes the data for correlation of pages (i.e. almost everyone who visited page 5 visited page 1 also), and somehow use that to predict in the future the pages that a user may end up liking?
Upvotes: 2
Views: 4260
Reputation: 5103
Recommendation systems are a big part of A.I research. I believe you are interested in a collection of algorithms called collaborative filtering. Since the netflix prize in 2007 this field has developed greatly. I would recommend going here and having a read. It explains the basic concepts of recommender systems in a succinct and clear way and also provides a link to Java source code for an approach to the Netflix project, MemReader. You could examine this source code and extrapolate the basic algorithms for building a recommendation engine.
Alternatively if you want a more mathematical explanation of the algorithms employed go here.
It shouldn't take too long to implement at all.
Upvotes: 8
Reputation: 2483
This post posed a similar question: Advanced MySQL: Find correlations between poll responses
I think you would be able to generate a similar response if your primary data table had one additional field in it, specifically the id of the page the used last visited or visited immediately following.
Something like this:
+------+----------+--------------+----------+
| id | page_id | next_page_id | user_id |
+------+----------+--------------+----------+
| 1 | 1 | 1 | 1 |
| 2 | 1 | 2 | 2 |
| 3 | 1 | 2 | 3 |
| 4 | 1 | 2 | 4 |
| 5 | 2 | 3 | 1 |
| 6 | 2 | 3 | 2 |
| 7 | 2 | 3 | 3 |
| 8 | 2 | 4 | 4 |
| 9 | 3 | 5 | 1 |
+------+----------+--------------+----------+
Then you should be able to use a modified version of one of the SQL queries suggested there to generate a list of high-correlation recommendations between the current page and the next page.
Upvotes: 1