Reputation: 660
I want to find most frequent product each customer has purchased. my data set is like this :
CustomerID ProdID FavouriteProduct
1 A ?
1 A ?
1 A ?
1 B ?
1 A ?
1 A ?
1 A ?
1 B ?
2 A ?
2 AN ?
2 G ?
2 C ?
2 C ?
2 F ?
2 D ?
2 C ?
There are so many products,So i cannot put them in a pivot table.
Answer would look like this :
CustomerID ProdID FavouriteProduct
1 A A
1 A A
1 A A
1 B A
1 A A
1 A A
1 A A
1 B A
2 A C
2 AN C
2 G C
2 C C
2 C C
2 F C
2 D C
2 C C
The query may look like this:
Update table
set FavouriteProduct = (Select
CustomerID, Product, Max(Count(Product))
From Table
group by CustomerID, Product) FP
Upvotes: 1
Views: 1968
Reputation: 21
To return the rows exactly as you have described in the question, you could try using a table expression (I used a CTE in my example) to first return a popularity ranking, where the higher the number, the more popular the product for each customer.
WITH RankTable AS (
SELECT
CustomerID, ProductID, COUNT(*) AS Popularity
FROM TableA
GROUP BY CustomerID, ProductID
)
Then the full results table can be returned by first performing an inner join on the original table (TableA) and the table expression (RankTable), and then using a window function to create the values in the FavoriteProduct column.
SELECT
P.CustomerID
, P.ProductID
, FIRST_VALUE(P.ProductID) OVER(
PARTITION BY R.CustomerID
ORDER BY R.Popularity DESC, R.ProductID) AS FavoriteProduct
FROM TableA AS P
INNER JOIN RankTable AS R
ON P.CustomerID = R.CustomerID
AND P.ProductID= R.ProductID;
Upvotes: 2
Reputation: 19194
This one, based on the example at the end of this page: http://www.sql-server-performance.com/2006/find-frequent-values/ might be quicker:
SELECT CustomerID, ProdID, Cnt
FROM
(
SELECT CustomerID, ProdID, COUNT(*) as Cnt,
RANK() OVER (
PARTITION BY CustomerID
ORDER BY COUNT(*) DESC
) AS Rnk
FROM YourTransactionTable
GROUP BY CustomerID, ProdID
) x
WHERE Rnk = 1
This one uses a RANK()
function. In this case you don't have to join back to the same table (meaning there's a lot less work required)
Now to update your existing data, I like to wrap my dataset in a WITH to make debugging a little easier and the final update a bit simpler:
;WITH
(
SELECT CustomerID, ProdID, Cnt
FROM
(
SELECT CustomerID, ProdID, COUNT(*) as Cnt,
RANK() OVER (PARTITION BY CustomerID
ORDER BY COUNT(*) DESC) AS Rnk
FROM TransactionTable
GROUP BY CustomerID, ProdID
) x
WHERE Rnk = 1
) As SRC
UPDATE FavouriteTable
SET Favourite = SRC.ProdID
FROM SRC
WHERE SRC.CustomerID = Favourite.CustomerID
Upvotes: 1
Reputation: 1270011
Another way to get the most frequent product is to use row_number()
:
select customerid, productid,
max(case when seqnum = 1 then productid end) over (partition by customerid) as favoriteproductid
from (select customerid, productid, count(*) as cnt,
row_number() over (partition by customerid order by count(*) desc) as seqnum
from customer c
group by customerid, productid
) cp;
Upvotes: 2
Reputation: 12317
Just in case your SQL doesn't perform fast enough and you have customers also in a smaller table, this might work better::
select C.CustomerId, R.ProductID
from Customer C
outer apply (
Select top 1 ProductID,Count(*) as Number
from table A
where A.CustomerId = C.CustomerId
group by ProductId
order by Number desc
) R
Upvotes: 1
Reputation: 660
Thanks to Nick, i found a way to find the most frequent value. i share with you how it works :
Select CustomerID,ProductID,Count(*) as Number
from table A
group by CustomerID,ProductID
having Count(*)>= (Select Max(Number) from (Select CustomerID,ProductID,Count(*) as Number from table B where B.CustomerID= A.CustomerID group by CustomerID,Product)C)
Upvotes: 1