Ian Wood
Ian Wood

Reputation: 6573

mysql distinct group_concat values

I have a bunch of user accounts that I need to create associations with based on DOB postcode etc.

I have the following query:

SELECT DISTINCT CONCAT_WS(' , ' ,a.user_id , GROUP_CONCAT( b.user_id SEPARATOR ' , ' ) )
FROM tbl_users_details a,
tbl_users_details b
WHERE a.user_id != b.user_id
AND a.date_of_birth = b.date_of_birth
AND a.postcode = b.postcode
AND LEVENSHTEIN_RATIO( a.last_name , b.last_name ) > 60
GROUP BY a.user_id

To demonstrate my requirements...

If accounts 1 5 9 and 12 meet the criteria (ie these are the same people)

I will get 4 results in the format

1  , 5 , 9 , 12
5  , 1 , 9 , 12
9  , 1 , 5 , 12
12 , 1 , 5 , 9

I deally I'd like just 1,5,9,12

Any pointers would be great.

thanks people.

Upvotes: 1

Views: 1656

Answers (4)

Ilmari Karonen
Ilmari Karonen

Reputation: 50328

In general, I would do something like this:

SELECT GROUP_CONCAT( user_id )
FROM tbl_users_details
GROUP BY date_of_birth, postcode, last_name

but the Levenshtein distance check makes this problematic, since there's actually no guarantee that LEVENSHTEIN_RATIO(x, y) > n and LEVENSHTEIN_RATIO(y, z) > n imply LEVENSHTEIN_RATIO(x, z) > n. (For example, what if one of your users was named "Anderson", another "Addison" and a third "Atkinson"?) You might want to consider using some other similarity estimation method that actually maps names into distinct groups, such as soundex or metaphone:

SELECT GROUP_CONCAT( user_id )
FROM tbl_users_details
GROUP BY date_of_birth, postcode, SOUNDEX(last_name)

Upvotes: 1

Ian Wood
Ian Wood

Reputation: 6573

reckon I got it....

SELECT  GROUP_CONCAT(ida ORDER BY ida ASC SEPARATOR ' , ') ids
FROM
(SELECT LEAST(a.user_id, b.user_id ) idbase,a.user_id ida, b.user_id idb
FROM apollo.tbl_users_details a,
apollo.tbl_users_details b
WHERE a.user_id != b.user_id
AND a.date_of_birth = b.date_of_birth
AND a.postcode = b.postcode
AND LEVENSHTEIN_RATIO( a.last_name , b.last_name ) > 60
GROUP BY a.user_id) as sub
GROUP BY idbase;

Running on full data set to test..

Upvotes: 0

shanmugamgsn
shanmugamgsn

Reputation: 840

Can you be more clear in your requirement??

anyways try using Subquery like

Select CONCAT (user.i,',)
from 
(Select Distinct ...... --- ur old code ---- )

Thanks, Shanmugam

Upvotes: 2

Devart
Devart

Reputation: 121922

You can include ORDER BY clause into the GROUP_CONCAT function -

... GROUP_CONCAT(b.user_id SEPARATOR ' , ' ORDER BY b.user_id)

Upvotes: 0

Related Questions