Reputation: 45
I work at a publishing site. I'm interested in developing a model that can predict a user's affinity for a piece or set of content based on the content they have previously engaged with.
Content is classified via categories and tags. Engagement per item could be binary (clicked on) or a 0-1 float value (normalized length of time engaged).
How should I train a model will allow me to personalize effectively per user?
I don't need realtime access to recommendations. Ideally I would retrain the model weekly with new clickstream data, and batch download data describing each user's top categories and tags with an affinity score.
Thanks.
Upvotes: 0
Views: 227
Reputation: 731
Working backwards from your use case, the user-personalization recipe is where you should start. This recipe is designed to recommend items (content in your case) to users based on their previous interactions with items/content.
The primary input into this recipe (and all Personalize recipes for that matter) is interactions/events. For you this would be the clicks/views of content. If you have historical interactions of these clicks, you can prepare a CSV with this data. The minimum required fields are USER_ID
, ITEM_ID
, and TIMESTAMP
where each row represents a moment in time when a specific user interacted with an item. You can optionally include an EVENT_TYPE
column and EVENT_VALUE
column. The values for EVENT_TYPE
depends on your application and event taxonomy. If you're just tracking clicks right now, you can use click
or view
as the event type and then add support for more event types in the future (e.g. bookmark
, favorite
, etc) as needed. For EVENT_VALUE
(type float), you could use your normalized length of time engaged. You can use the EVENT_VALUE
to filter which events are included in training by specifying an eventType
and eventValueThreshold
when creating your solution. For example, if you consider any values equal to or greater than, say, 0.4 to indicate positive interest by a user in a piece of content, you can set a eventValueThreshold
of 0.4 and Personalize will only include interactions equal to or above that value in training. Personalize will also include the event value as a feature in the model but it won't be used to weight or reward interactions based this value.
The user-personalization recipe will also consider the items and users datasets, if provided. For your use case, providing an items dataset is where you'd specify the categories and tags for each piece of content (item). You can also include the raw text for each piece of content as a textual field in your items dataset. Personalize will automatically extract features from your textual field to improve the relevance of recommendations.
Once you have your datasets imported into a dataset group, you can create a solution using the user-personalization recipe and then a solution version (which represents the trained model). To get batch recommendations weekly, you would use a batch inference job each week to generate recommendations for each user. The output of the batch inference job can then be processed to determine the category and tag affinities for each user based on the recommended content.
Upvotes: 1