mohamed naser
mohamed naser

Reputation: 492

building user-based collaborative filtering system in Django

I'm trying to build a simple user based collaborative filtering in Django for an E-commerce using just the purchase history.
Here are the steps I use, I know it needs more improvements but I've no idea what's the next move.

here's the product model

class Product(models.Model):
    name = models.CharField(max_length=100)
    description = models.TextField()

here's the purcashe model

class Purchase(models.Model):
    user = models.ForeignKey(User, on_delete=models.CASCADE)
    product = models.ForeignKey(Product, on_delete=models.CASCADE)
    purchase_date = models.DateTimeField(auto_now_add=True)

Now to get similar users

def find_similar_users(user, k=5):
    all_users = User.objects.exclude(id=user.id)
    similarities = [(other_user, jaccard_similarity(user, other_user)) for other_user in all_users]
    similarities.sort(key=lambda x: x[1], reverse=True)
    return [user_similarity[0] for user_similarity in similarities[:k]]

and to calculate similarity between each:

def jaccard_similarity(user1, user2):
    user1_purchases = set(Purchase.objects.filter(user=user1).values_list('product_id', flat=True))
    user2_purchases = set(Purchase.objects.filter(user=user2).values_list('product_id', flat=True))

    intersection = user1_purchases.intersection(user2_purchases)
    union = user1_purchases.union(user2_purchases)

    return len(intersection) / len(union) if len(union) > 0 else 0

now here's my entry function:

def recommend_products(user, k=5):
    similar_users = find_similar_users(user, k)
    recommended_products = set()

    for similar_user in similar_users:
        purchases = Purchase.objects.filter(user=similar_user).exclude(product__in=recommended_products)
        for purchase in purchases:
            recommended_products.add(purchase.product)

    return recommended_products

Now, obviously that'd be really slow, I was thinking of using a copy of the data in another no-sql database.

Now if user A purchase something, I copy the data to the other database, do the calculation and store the returned similar products "obviously using background service like celery" in the no-sql database, and just retrieve them later for user A if needed, is that the right approach?

Upvotes: 2

Views: 161

Answers (1)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 477607

You can boost efficency a lot with:

def find_similar_users(user, k=5):
    all_users = User.objects.exclude(id=user.id).prefetch_related('purchase_set')
    similarities = [
        (other_user, jaccard_similarity(user, other_user))
        for other_user in all_users
    ]


def jaccard_similarity(user1, user2):
    user1_purchases = {
        purchase.product_id for purchase in user1.purchase_set.all()
    }
    user1_purchases = {
        purchase.product_id for purchase in user2.purchase_set.all()
    }

    intersection = user1_purchases.intersection(user2_purchases)
    union = user1_purchases.union(user2_purchases)

    return len(intersection) / len(union) if len(union) > 0 else 0

This will retrieve all Purchases in "bulk" and thus only make two queries, which is probably where the bottleneck is anyway.

Upvotes: 1

Related Questions