Vladimir Gatev
Vladimir Gatev

Reputation: 411

SQL How to group by two columns

Bellow is an example table.

ID   FROM       TO         DATE  
1    Number1    Number2    somedate
2    Number2    Number1    somedate
3    Number2    Number1    somedate
4    Number3    Number1    somedate
5    Number3    Number2    somedate

Expected result is to get 1 row for each unique pair of TO and FROM columns

Example result if ordered by ID ASC

(1,Number1,Number2)
(4,Number3,Number1)
(5,Number3,Number2)

Ok I have found how to do this with the following query

SELECT * FROM table GROUP BY LEAST(to,from), GREATEST(to,from)

However I am not able to get the most recent record for every unique pair.

I have tried with order by ID desc but it returns the first found row for unique pair.

Upvotes: 3

Views: 3190

Answers (3)

Vladimir Baranov
Vladimir Baranov

Reputation: 32695

This answer was originally inspired by Get records with max value for each group of grouped SQL results but then I looked further and came up with the correct solution.

CREATE TABLE T
    (`id` int, `from` varchar(7), `to` varchar(7), `somedate` datetime)
;

INSERT INTO T
    (`id`, `from`, `to`, `somedate`)
VALUES
    (1, 'Number1', 'Number2', '2015-01-01 00:00:00'),
    (2, 'Number2', 'Number1', '2015-01-02 00:00:00'),
    (3, 'Number2', 'Number1', '2015-01-03 00:00:00'),
    (4, 'Number3', 'Number1', '2015-01-04 00:00:00'),
    (5, 'Number3', 'Number2', '2015-01-05 00:00:00');

Tested on MySQL 5.6.19

SELECT * 
FROM 
    (
        SELECT * 
        FROM T 
        ORDER BY LEAST(`to`,`from`), GREATEST(`to`,`from`), somedate DESC
    ) X
GROUP BY LEAST(`to`,`from`), GREATEST(`to`,`from`)

Result set

id from    to      somedate
3  Number2 Number1 2015-01-03
4  Number3 Number1 2015-01-04
5  Number3 Number2 2015-01-05

But, this relies on some shady behavior of MySQL, which will be changed in future versions. MySQL 5.7 rejects this query because the columns in the SELECT clause are not functionally dependent on the GROUP BY columns. If it is configured to accept it (ONLY_FULL_GROUP_BY is disabled), it works like the previous versions, but still it is not guaranteed: "The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate."

So, the correct answer seems to be this:

SELECT T.*
FROM 
    T
    INNER JOIN 
        (
        SELECT 
            LEAST(`to`,`from`) AS LowVal, 
            GREATEST(`to`,`from`) AS HighVal, 
            MAX(somedate) AS MaxDate
        FROM T
        GROUP BY LEAST(`to`,`from`), GREATEST(`to`,`from`)
        ) v
        ON T.somedate = v.MaxDate
        AND (T.From = v.LowVal OR T.From = v.HighVal)
        AND (T.To = v.LowVal OR T.To = v.HighVal)

Result set is the same as above, but in this case it is guaranteed to stay like this, while before you could easily get different date and id for row Number2, Number1, depending on what indexes you have on the table.

It will work as expected until you have two rows in the original data that have exactly the same somedate and to and from.

Let's add another row:

INSERT INTO T (`id`, `from`, `to`, `somedate`)
VALUES (6, 'Number1', 'Number2', '2015-01-03 00:00:00');

The query above would return two rows for 2015-01-03:

id from    to      somedate
3  Number2 Number1 2015-01-03
6  Number1 Number2 2015-01-03
4  Number3 Number1 2015-01-04
5  Number3 Number2 2015-01-05

To fix this we need a method to choose only one row in the group. In this example we can use unique ID to break the tie. If there are more than one rows in the group with the same maximum date we will choose the row with the largest ID.

The inner-most sub-query called Groups simply returns all groups, like original query in the question. Then we add one column id to this result set, and we choose id that belongs to the same group and has highest somedate and then highest id, which is done by ORDER BY and LIMIT. This sub-query is called GroupsWithIDs. Once we have all groups and an id of the correct row for each group we join this to the original table to fetch the rest of the column for found ids.

final query

SELECT T.*
FROM
    (
    SELECT
        Groups.N1
        ,Groups.N2
        ,
        (
            SELECT T.id
            FROM T
            WHERE
                LEAST(`to`,`from`) = Groups.N1 AND
                GREATEST(`to`,`from`) = Groups.N2
            ORDER BY T.somedate DESC, T.id DESC
            LIMIT 1
        ) AS id
    FROM
        (
            SELECT LEAST(`to`,`from`) AS N1, GREATEST(`to`,`from`) AS N2
            FROM T 
            GROUP BY LEAST(`to`,`from`), GREATEST(`to`,`from`)
        ) AS Groups
    ) AS GroupsWithIDs
    INNER JOIN T ON T.id = GroupsWithIDs.id

final result set

id from    to      somedate
4  Number3 Number1 2015-01-04
5  Number3 Number2 2015-01-05
6  Number1 Number2 2015-01-03

Upvotes: 0

G B
G B

Reputation: 1462

SQL fiddle isn't working for some reason so in the mean time you will need to help me to help you.

Assuming that the following statement works

SELECT 
    LEAST(to,from) as LowVal, 
    GREATEST(to,from) as HighVal, 
    MAX(date) as MaxDate
FROM table 
GROUP BY LEAST(to,from), GREATEST(to,from)

then you could join to that as

select t.*
from 
    table t
    inner join 
        (SELECT 
            LEAST(to,from) as LowVal, 
            GREATEST(to,from) as HighVal, 
            MAX(date) as MaxDate
        FROM table 
        GROUP BY LEAST(to,from), GREATEST(to,from)
        ) v
        on t.date = v.MaxDate
        and (t.From = v.LowVal or t.From = v.HighVal)
        and (t.To = v.LowVal or t.To= v.HighVal)

Upvotes: 5

David Jacobsen
David Jacobsen

Reputation: 474

I believe the following would work, my knowledge is with Microsoft SQL Server, not MySQL. If MySQL lacks one of these, let me know and I'll delete the answer.

DECLARE @Table1 TABLE(
ID int,
Too varchar(10),
Fromm varchar(10),
Compared int)

INSERT INTO @Table1 values (1, 'John','Mary', 2), (2,'John', 'Mary', 1), (3,'Sue','Charles',1), (4,'Mary','John',3)


SELECT ID, Too, Fromm, Compared
FROM @Table1 as t
INNER JOIN
(
SELECT
    CASE WHEN Too < Fromm THEN Too+Fromm
    ELSE Fromm+Too
    END as orderedValues, MIN(compared) as minComp
FROM @Table1
GROUP BY    CASE WHEN Too < Fromm THEN Too+Fromm
ELSE Fromm+Too
END
) ordered ON 
ordered.minComp = t.Compared 
AND ordered.orderedValues = 
        CASE 
            WHEN Too < Fromm 
                THEN Too+Fromm
            ELSE 
                Fromm+Too
        END

I used an int instead of time value, but it would work the same. It's dirty, but it's giving me the results I expected.

The basics of it, is to use a derived query where you take the two columns you want to get unique values for and use a case statement to combine them into a standard format. In this case, earlier alphabetical concatenated with the later value alphabetically. Use that value to get the minimum value we are looking for, join back to the original table to get the values separated out again plus whatever else is in that table. It is assuming the value we are aggregating is going to be unique, so in this case if there was (1, 'John', 'Mary', 2) and (2, 'Mary', 'John', 2), it would kind of break and return 2 records for that couple.

Upvotes: 0

Related Questions