Reputation: 411
Bellow is an example table.
ID FROM TO DATE
1 Number1 Number2 somedate
2 Number2 Number1 somedate
3 Number2 Number1 somedate
4 Number3 Number1 somedate
5 Number3 Number2 somedate
Expected result is to get 1 row for each unique pair of TO and FROM columns
Example result if ordered by ID ASC
(1,Number1,Number2)
(4,Number3,Number1)
(5,Number3,Number2)
Ok I have found how to do this with the following query
SELECT * FROM table GROUP BY LEAST(to,from), GREATEST(to,from)
However I am not able to get the most recent record for every unique pair.
I have tried with order by ID desc
but it returns the first found row for unique pair.
Upvotes: 3
Views: 3190
Reputation: 32695
This answer was originally inspired by Get records with max value for each group of grouped SQL results but then I looked further and came up with the correct solution.
CREATE TABLE T
(`id` int, `from` varchar(7), `to` varchar(7), `somedate` datetime)
;
INSERT INTO T
(`id`, `from`, `to`, `somedate`)
VALUES
(1, 'Number1', 'Number2', '2015-01-01 00:00:00'),
(2, 'Number2', 'Number1', '2015-01-02 00:00:00'),
(3, 'Number2', 'Number1', '2015-01-03 00:00:00'),
(4, 'Number3', 'Number1', '2015-01-04 00:00:00'),
(5, 'Number3', 'Number2', '2015-01-05 00:00:00');
Tested on MySQL 5.6.19
SELECT *
FROM
(
SELECT *
FROM T
ORDER BY LEAST(`to`,`from`), GREATEST(`to`,`from`), somedate DESC
) X
GROUP BY LEAST(`to`,`from`), GREATEST(`to`,`from`)
Result set
id from to somedate
3 Number2 Number1 2015-01-03
4 Number3 Number1 2015-01-04
5 Number3 Number2 2015-01-05
But, this relies on some shady behavior of MySQL, which will be changed in future versions. MySQL 5.7 rejects this query because the columns in the SELECT clause are not functionally dependent on the GROUP BY columns. If it is configured to accept it (ONLY_FULL_GROUP_BY
is disabled), it works like the previous versions, but still it is not guaranteed: "The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate."
So, the correct answer seems to be this:
SELECT T.*
FROM
T
INNER JOIN
(
SELECT
LEAST(`to`,`from`) AS LowVal,
GREATEST(`to`,`from`) AS HighVal,
MAX(somedate) AS MaxDate
FROM T
GROUP BY LEAST(`to`,`from`), GREATEST(`to`,`from`)
) v
ON T.somedate = v.MaxDate
AND (T.From = v.LowVal OR T.From = v.HighVal)
AND (T.To = v.LowVal OR T.To = v.HighVal)
Result set is the same as above, but in this case it is guaranteed to stay like this, while before you could easily get different date and id for row Number2, Number1
, depending on what indexes you have on the table.
It will work as expected until you have two rows in the original data that have exactly the same somedate
and to
and from
.
Let's add another row:
INSERT INTO T (`id`, `from`, `to`, `somedate`)
VALUES (6, 'Number1', 'Number2', '2015-01-03 00:00:00');
The query above would return two rows for 2015-01-03
:
id from to somedate
3 Number2 Number1 2015-01-03
6 Number1 Number2 2015-01-03
4 Number3 Number1 2015-01-04
5 Number3 Number2 2015-01-05
To fix this we need a method to choose only one row in the group. In this example we can use unique ID
to break the tie. If there are more than one rows in the group with the same maximum date we will choose the row with the largest ID.
The inner-most sub-query called Groups
simply returns all groups, like original query in the question. Then we add one column id
to this result set, and we choose id
that belongs to the same group and has highest somedate
and then highest id
, which is done by ORDER BY
and LIMIT
. This sub-query is called GroupsWithIDs
. Once we have all groups and an id
of the correct row for each group we join
this to the original table to fetch the rest of the column for found id
s.
final query
SELECT T.*
FROM
(
SELECT
Groups.N1
,Groups.N2
,
(
SELECT T.id
FROM T
WHERE
LEAST(`to`,`from`) = Groups.N1 AND
GREATEST(`to`,`from`) = Groups.N2
ORDER BY T.somedate DESC, T.id DESC
LIMIT 1
) AS id
FROM
(
SELECT LEAST(`to`,`from`) AS N1, GREATEST(`to`,`from`) AS N2
FROM T
GROUP BY LEAST(`to`,`from`), GREATEST(`to`,`from`)
) AS Groups
) AS GroupsWithIDs
INNER JOIN T ON T.id = GroupsWithIDs.id
final result set
id from to somedate
4 Number3 Number1 2015-01-04
5 Number3 Number2 2015-01-05
6 Number1 Number2 2015-01-03
Upvotes: 0
Reputation: 1462
SQL fiddle isn't working for some reason so in the mean time you will need to help me to help you.
Assuming that the following statement works
SELECT
LEAST(to,from) as LowVal,
GREATEST(to,from) as HighVal,
MAX(date) as MaxDate
FROM table
GROUP BY LEAST(to,from), GREATEST(to,from)
then you could join to that as
select t.*
from
table t
inner join
(SELECT
LEAST(to,from) as LowVal,
GREATEST(to,from) as HighVal,
MAX(date) as MaxDate
FROM table
GROUP BY LEAST(to,from), GREATEST(to,from)
) v
on t.date = v.MaxDate
and (t.From = v.LowVal or t.From = v.HighVal)
and (t.To = v.LowVal or t.To= v.HighVal)
Upvotes: 5
Reputation: 474
I believe the following would work, my knowledge is with Microsoft SQL Server, not MySQL. If MySQL lacks one of these, let me know and I'll delete the answer.
DECLARE @Table1 TABLE(
ID int,
Too varchar(10),
Fromm varchar(10),
Compared int)
INSERT INTO @Table1 values (1, 'John','Mary', 2), (2,'John', 'Mary', 1), (3,'Sue','Charles',1), (4,'Mary','John',3)
SELECT ID, Too, Fromm, Compared
FROM @Table1 as t
INNER JOIN
(
SELECT
CASE WHEN Too < Fromm THEN Too+Fromm
ELSE Fromm+Too
END as orderedValues, MIN(compared) as minComp
FROM @Table1
GROUP BY CASE WHEN Too < Fromm THEN Too+Fromm
ELSE Fromm+Too
END
) ordered ON
ordered.minComp = t.Compared
AND ordered.orderedValues =
CASE
WHEN Too < Fromm
THEN Too+Fromm
ELSE
Fromm+Too
END
I used an int instead of time value, but it would work the same. It's dirty, but it's giving me the results I expected.
The basics of it, is to use a derived query where you take the two columns you want to get unique values for and use a case statement to combine them into a standard format. In this case, earlier alphabetical concatenated with the later value alphabetically. Use that value to get the minimum value we are looking for, join back to the original table to get the values separated out again plus whatever else is in that table. It is assuming the value we are aggregating is going to be unique, so in this case if there was (1, 'John', 'Mary', 2) and (2, 'Mary', 'John', 2), it would kind of break and return 2 records for that couple.
Upvotes: 0