Reputation: 9
I have this table called file
id integer primary key,
created_on timestamp
updated_on timestamp
file_name text not null
path text not null unique
hash text not null
size bigint not null
size_mb bigint not null
I want to get all the records with a unique hash
value (that is a single instance of duplicated files) and then sum the values in the size
column to the total bytes of disk space I'll need to back up a single copy of each file.
Upvotes: 0
Views: 93
Reputation: 60462
This returns only unique hashes, i.e. no duplicates exist:
select *,
-- group sum of all files
sum(size) over ()
from
(
select *,
-- rows per hash
count(*) over (partition by hash) as cnt
from file
) as dt
where cnt = 1
Edit: This return only one row per hash:
select *,
-- group sum of all files
sum(size) over ()
from
(
select *,
-- unique number per hash
row_number(*) over (partition by hash order by hash) as rn
from file
) as dt
where rn = 1
Both queries are Standard SQL, but PostgreSQL also supports proprietary syntax:
select *,
-- group sum of all files
sum(size) over ()
from
(
select DISTINCT ON (hash) *
from file
order by hash
) as dt
Upvotes: 1