Reputation: 110412
Let's say I have the following data:
md5 file_size
1234 127.4
1234 127.4
1234 127.4
909 18.2
909 18.2
104 207
I want to find out how much data I have in duplicates -- that is -- if we have one of each of the files (unique by md5) how much space do we save? The answer should be:
md5 size_saved
1234 254.8
909 18.2
104 0
Here is the base query I have thus far:
SELECT file_size, SUM(file_size) total_size FROM files GROUP BY md5
I think the simplest way to do this is to do ABS(file_size - total_size)
, but I'm not sure how to do this without using a sub-select or an outer-query that has access to both the aggregate size and the single file size. What would be the proper way to do this, or is the only way with a subselect?
Upvotes: 3
Views: 109
Reputation: 522311
You could use:
SELECT
md5,
SUM(file_size) - MIN(file_size) AS size_saved
FROM yourTable
GROUP BY
md5;
Note that my answer assumes that all records for a given md5
would always have the same file_size
values, in the event that there be more than one record. If not, then my answer would not work, but we would have to redefine the logic anyway in this case.
Upvotes: 6
Reputation: 6541
You can use file_size
in GROUP BY, so you can easily include in select statement.
SELECT md5, SUM(file_size) - file_size AS size_saved
FROM files
GROUP BY md5, file_size
Upvotes: 3
Reputation: 129
I supposed if it is duplicate, the file size is the same, so you can try something like this :
select md5,file_size*(count(file_size)-1) as space_saved from files group by md5,file_size
Upvotes: 3