Reputation: 97

How to assign most repeated value of a column to each unique value of another column?

This table is an aggregated table and count means number of rows we have in the data for the first four columns. I want to assign one and just one organization id to each customer (organization with maximum number of appearance in the data "organization which has highest sum of count").

customer name   organization id     item    city    count
Jan Tomas        3478               cloth    Rom     20
Jan Tomas        3478               cloth    Milan   12
Jan Tomas        3478               shoe     Munich  14
Jan Tomas        3478               shoe     Rom      5
Jan Tomas        653                cloth    Berlin  10
Jan Tomas        653                shoe     Brussels 5
Jan Tomas        123                cloth    Paris   12
Jan Tomas        123                cloth    Rom     14
Martin Muller    654                cloth    Rom     15
Martin Muller    654                cloth    Berlin  16
Martin Muller    654                shoe     Rom      7
Martin Muller    980                cloth    Milan   28
Martin Muller    980                shoe     Paris   19
Janatan Kery     765                cloth    Rom     20
Janatan Kery     765                cloth    Munich  11
Janatan Kery     765                shoe     Rom     22
Janatan Kery     476                cloth    Milan   2
Janatan Kery     476                cloth    Rom     24

I would like to have the output as follow. Any help from you would be really appreciated. This is just sample of data . I have more than 2 millions unique customers.

customer name     organization id    item    city    count
Jan Tomas          3478              cloth    Rom     20
Jan Tomas          3478              cloth    Milan   12
Jan Tomas          3478              shoe     Munich  14
Jan Tomas          3478              shoe     Rom      5
Martin Muller      980               cloth    Milan   28
Martin Muller      980               shoe     Paris   19
Janatan Kery       765               cloth    Rom     20
Janatan Kery       765               cloth    Munich  11
Janatan Kery       765               shoe     Rom     22

Upvotes: 0

Answers (2)

Aaron Dietz

Reputation: 10297

Try this, it uses ROW_NUMBER() to find the largest SUM(Count) per CustomerName/OrganizationID, then joins back to the main table for the rest of the columns:

SELECT yt.*
FROM YourTable yt
JOIN (SELECT CustomerName, 
             OrganizationID, 
             ROW_NUMBER() OVER (PARTITION BY CUST ORDER BY SUM(Count) DESC) RN
      FROM YourTable
      GROUP BY CustomerName, OrganizationID) A ON A.RN = 1
                                              AND A.CustomerName = yt.CustomerName
                                              AND A.OrganizationID = yt.OrganizationID

Upvotes: 1

Matthew Baker

Reputation: 2729

This would be how I'd do it in MS SQL Server. Maybe it will lead you in the right direction.

WITH    cte
      AS (SELECT    [customer name]
          ,         [organization id]
          ,         item
          ,         city
          ,         COUNT
          ,         ROW_NUMBER() OVER (PARTITION BY item, city ORDER BY COUNT DESC) AS RN
          FROM      TableName
         )
SELECT  [customer name]
,       [organization id]
,       item
,       city
,       COUNT
FROM    cte
WHERE   RN = 1

Upvotes: 0

How to assign most repeated value of a column to each unique value of another column?

Answers (2)

Related Questions