Andrej
Andrej

Reputation: 3849

Count duplicate rows in SQL

Suppose we have the following data structure:

s_cui   o_cui   year
D000001 D000001 1975
D000001 D000001 1976
D000001 D002413 1976
D000001 D002413 1979
D000001 D002413 1987
D000001 D004298 1976
D000002 D000002 1985
D000003 D000900 1975
D000003 D000900 1990
D000003 D004134 1983
D000003 D004134 1986

I need to count duplicate pairs (in the 1st and 2nd column) and to each such pair assign the lowest value from the 3rd column. For this example the output should be:

s_cui   o_cui   freq  year
D000001 D000001 2     1975
D000001 D002413 3     1976
D000001 D004298 1     1976
D000002 D000002 1     1985
D000003 D000900 2     1975
D000003 D004134 2     1983

My initial attempt is here (without year field):

SELECT s_cui, o_cui, COUNT(*) FROM table GROUP BY s_cui, o_cui;

Upvotes: 0

Views: 81

Answers (1)

xQbert
xQbert

Reputation: 35353

You're close! You just need to add min(year).

It looks like you want the earliest occurrence of a year for each s_cui, o_cui and count.

SELECT s_cui, o_cui, COUNT(*) freq, min(`year`) as `year`
FROM table 
GROUP BY s_cui, o_cui;

Min(year) will return the earliest year for each pairing of s_cui, and o_cui while still retaining the counts. I put ` around the year as it is a reserved word in mySQL.

though I don't see how "pairing" takes effect more of each "group"

Upvotes: 1

Related Questions