Vlad Schnakovszki
Vlad Schnakovszki

Reputation: 8601

SQL Query Duplicate Result

I am doing a project using MySQL 5. The requirement is the following:

Give the user names, device types, OS version and fruit involved in picks, where users had the same device type, were running iOS 4 or 4.1, and picked the same fruit

The relevant tables are as follows:

User: {uID: INT, name: VARCHAR(45), deviceOS: VARCHAR(45), deviceType: VARCHAR(45)}

Pick: {uID: INT, ts: TIMESTAMP, fruit: VARCHAR(45)}

(Primary keys in bold. uID in Pick is a foreign key of uID in User.)

I am doing the following query:

SELECT DISTINCT NAME1, OS1, DEV1, NAME2, OS2, DEV2, P1.fruit FROM Pick AS P1, Pick AS P2, 

(SELECT U1.uID AS User1, U1.name AS NAME1, U1.deviceOS AS OS1, U1.deviceType AS DEV1, 
        U2.uID AS User2, U2.name AS NAME2, U2.deviceOS AS OS2, U2.deviceType AS DEV2
FROM User AS U1, User AS U2 
WHERE (U1.uID != U2.uID) AND
      (U1.deviceType = U2.deviceType) AND
      (U1.deviceOS = "4" OR U1.deviceOS = "4.1") AND
      (U2.deviceOS = "4" OR U2.deviceOS = "4.1")) AS PartialResult

WHERE (P1.uID = PartialResult.User1) AND
      (P2.uID = PartialResult.User2) AND
      (P1.fruit = P2.fruit)

This returns the following result, but as you see, it is in some way "duplicated":

I have tried solving this using GROUP BY fruit but it will not return the correct result on the general case. Limit 1 also would not work on the general case. So after numerous hours trying to figure this out, I must ask:

Is there a way to prevent this kind of duplication on the general case?

Thank you!

Upvotes: 0

Views: 96

Answers (2)

ean5533
ean5533

Reputation: 8994

The problem you're encountering is that every single row is going to be duplicated, a--b and b--a. You need some way of specifying that you only want one or the other, but the question is, which one? Do you have a preference whether Priscilla is listed before Marcia, or vice versa?

If there is no preference, then you can just make up some arbitrary rule that will only allow one or the other to go through. For example, you can compare names and only grab rows where the first name is lexicographically before the second (see last line):

SELECT DISTINCT NAME1, OS1, DEV1, NAME2, OS2, DEV2, P1.fruit FROM Pick AS P1, Pick AS P2, 

(SELECT U1.uID AS User1, U1.name AS NAME1, U1.deviceOS AS OS1, U1.deviceType AS DEV1, 
        U2.uID AS User2, U2.name AS NAME2, U2.deviceOS AS OS2, U2.deviceType AS DEV2
FROM User AS U1, User AS U2 
WHERE (U1.uID != U2.uID) AND
      (U1.deviceType = U2.deviceType) AND
      (U1.deviceOS = "4" OR U1.deviceOS = "4.1") AND
      (U2.deviceOS = "4" OR U2.deviceOS = "4.1")) AS PartialResult

WHERE (P1.uID = PartialResult.User1) AND
      (P2.uID = PartialResult.User2) AND
      (P1.fruit = P2.fruit) AND
      (STRCMP(NAME1, NAME2) < 0)

Of course, you can implement any rule you want that picks one or the other. @igelkott's answer solves the problem the same way by enforcing person 1's uID to be higher than person 2's uID, which is very reasonable (and faster than doing string compares).

Upvotes: 2

igelkott
igelkott

Reputation: 1287

Instead of U1.uID != U2.uID, write U1.uID > U2.uID.

Upvotes: 4

Related Questions