Anish N Sharma
Anish N Sharma

Reputation: 23

Count number occurrences of a String in a table and report it in a new column

Input

Name
A
A
B
B
B
B
C
B
C
A
D

Required Output

Name    Occurance
A   1
A   2
B   1
B   2
B   3
B   4
C   1
B   5
C   2
A   3
D   1

Please Note - I have more than 100000 unique entries in the name column and hence I cannot hardcode name in the code

Upvotes: 0

Views: 440

Answers (2)

Paul Maxwell
Paul Maxwell

Reputation: 35583

The expected result shows (for example) that A occurs 3 times and gets numbered 1, 2, 3 even though they don't all occur together. A single row_number() will suffice: SQL Fiddle

CREATE TABLE Table1 (ID int identity(1,1) primary key, [Name] varchar(1));   
INSERT INTO Table1 ([Name])
VALUES     ('A'),('A'),('B'),('B'),('B'),('B'),('C'),('B'),('C'),('A'),('D');

Query 1:

select
       name
     , row_number() over (partition by name order by id) as Occurance
from table1
order by id

As tables are considered unordered sets, to reproduce the sequence shown in the expected result there would have to be some other column(s) for that. I have assumed an identity column ID for that.

Results:

| name | Occurance |
|------|-----------|
|    A |         1 |
|    A |         2 |
|    B |         1 |
|    B |         2 |
|    B |         3 |
|    B |         4 |
|    C |         1 |
|    B |         5 |
|    C |         2 |
|    A |         3 |
|    D |         1 |

If you aren't familiar with these, I recommend you do become so as they are extremely handy:

  1. ROW_NUMBER()
  2. SELECT - OVER()

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269703

SQL tables represent unordered sets. Ordering is provided only by columns that specify the ordering.

Assuming you have an ordering -- such as an identity id column -- you can use the difference of row numbers approach:

select name, count(*)
from (select name,
             row_number() over (order by id) as seqnum,
             row_number() over (partition by name order by id) as seqnum_n
      from t
     ) t
group by name, (seqnum - seqnum_n);

To understand why this works, you have to stare at the results of the subquery until you "get" why the difference defines adjacent values.

Upvotes: 1

Related Questions