Reputation: 6573
I have a bunch of user accounts that I need to create associations with based on DOB postcode etc.
I have the following query:
SELECT DISTINCT CONCAT_WS(' , ' ,a.user_id , GROUP_CONCAT( b.user_id SEPARATOR ' , ' ) )
FROM tbl_users_details a,
tbl_users_details b
WHERE a.user_id != b.user_id
AND a.date_of_birth = b.date_of_birth
AND a.postcode = b.postcode
AND LEVENSHTEIN_RATIO( a.last_name , b.last_name ) > 60
GROUP BY a.user_id
To demonstrate my requirements...
If accounts 1 5 9 and 12 meet the criteria (ie these are the same people)
I will get 4 results in the format
1 , 5 , 9 , 12
5 , 1 , 9 , 12
9 , 1 , 5 , 12
12 , 1 , 5 , 9
I deally I'd like just 1,5,9,12
Any pointers would be great.
thanks people.
Upvotes: 1
Views: 1656
Reputation: 50328
In general, I would do something like this:
SELECT GROUP_CONCAT( user_id )
FROM tbl_users_details
GROUP BY date_of_birth, postcode, last_name
but the Levenshtein distance check makes this problematic, since there's actually no guarantee that LEVENSHTEIN_RATIO(x, y) > n
and LEVENSHTEIN_RATIO(y, z) > n
imply LEVENSHTEIN_RATIO(x, z) > n
. (For example, what if one of your users was named "Anderson", another "Addison" and a third "Atkinson"?) You might want to consider using some other similarity estimation method that actually maps names into distinct groups, such as soundex or metaphone:
SELECT GROUP_CONCAT( user_id )
FROM tbl_users_details
GROUP BY date_of_birth, postcode, SOUNDEX(last_name)
Upvotes: 1
Reputation: 6573
reckon I got it....
SELECT GROUP_CONCAT(ida ORDER BY ida ASC SEPARATOR ' , ') ids
FROM
(SELECT LEAST(a.user_id, b.user_id ) idbase,a.user_id ida, b.user_id idb
FROM apollo.tbl_users_details a,
apollo.tbl_users_details b
WHERE a.user_id != b.user_id
AND a.date_of_birth = b.date_of_birth
AND a.postcode = b.postcode
AND LEVENSHTEIN_RATIO( a.last_name , b.last_name ) > 60
GROUP BY a.user_id) as sub
GROUP BY idbase;
Running on full data set to test..
Upvotes: 0
Reputation: 840
Can you be more clear in your requirement??
anyways try using Subquery like
Select CONCAT (user.i,',)
from
(Select Distinct ...... --- ur old code ---- )
Thanks, Shanmugam
Upvotes: 2
Reputation: 121922
You can include ORDER BY clause into the GROUP_CONCAT function -
... GROUP_CONCAT(b.user_id SEPARATOR ' , ' ORDER BY b.user_id)
Upvotes: 0