Reputation: 2523
I have a table that contains a list of files and their sizes, and my goal is to select out only those files where the filename and size are equal. So I came up with this Select statement:
SELECT FileName, FileSize FROM Files WHERE FileName IN
(
SELECT FileName FROM Files WHERE FileSize > 10240
GROUP BY FileName, FileSize HAVING Count(*) > 1
)
ORDER BY FileName, FileSize;
The problem is, as I was sifting through the results, I saw these records:
022704.jpg 40960
022704.jpg 42080
022704.jpg 42080
022704.jpg 42080
022704.jpg 42080
Wher the top record has a size different than the rest. So I ran this query
SELECT FileName, FileSize from Files WHERE FileName = '022704.jpg' AND FileSize = 40960;
And sure enough, there is only one record with that filename and that file size.
What am I doing wrong in my first select statement?
Edit:
I see what I'm doing wrong now - The query will return the record as long as the filename is in the subquery without respecting the subquery in the final results ... but I don't know how to fix it. When I modified the query to look like this
SELECT FileName, FileSize FROM Files WHERE FileName IN
(
SELECT FileName FROM Files WHERE FileSize > 10024
GROUP BY FileName, FileSize HAVING Count(*) > 1
)
AND FileSize in
(
SELECT FileSize FROM Files WHERE FileSize > 10024
GROUP BY FileName, FileSize HAVING Count(*) > 1
)
ORDER BY FileName, FileSize;
The query is still running after 5 minutes of waiting.
Upvotes: 0
Views: 31
Reputation: 4694
Size needs to be part of the IN
logic as well. Try this:
SELECT FileName, FileSize
FROM Files WHERE (FileName, FileSize) IN
(
SELECT FileName, FileSize FROM Files WHERE FileSize > 10240
GROUP BY FileName, FileSize HAVING Count(*) > 1
)
ORDER BY FileName, FileSize
;
For the performance issue, make sure you have an index on (FileName, FileSize)
.
Upvotes: 1