Reputation: 2102
I have a table in RedShift. How can I see how many disk-space it uses?
Upvotes: 23
Views: 22412
Reputation: 36166
I know this question is old and already has an answer accepted but I must point out that the answer is wrong. What the query is outputting there as "mb" is actually the "number of blocks". The answer would be correct only if the block size is 1MB (which is the default).
If the block size is different (in my case for example is 256K), you have to multiply the number of blocks by its size in bytes. I suggest the following change to your query where I multiply the number of blocks by the block size in bytes (262144 bytes) and then divide by (1024*1024) to output the total in megabytes:
select
trim(pgdb.datname) as Database,
trim(pgn.nspname) as Schema,
trim(a.name) as Table,
b.mbytes as previous_wrong_value,
(b.mbytes * 262144)::bigint/(1024*1024) as "Total MBytes",
a.rows
from (
select db_id, id, name, sum(rows) as rows
from stv_tbl_perm a
group by db_id, id, name
) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (
select tbl, count(blocknum) as mbytes
from stv_blocklist
group by tbl
) b on a.id = b.tbl
order by mbytes desc, a.db_id, a.name;
Upvotes: 14
Reputation: 11
Just thought I would expand on this as I am facing an issue of uneven distribution. I've added some links and fields to enable analysis of space by node and slice. Also added are max/min values and number of values per slice for column 0.
select
cast(use.usename as varchar(50)) as owner,
trim(pgdb.datname) as Database,
trim(pgn.nspname) as Schema,
trim(a.name) as Table,
a.node,
a.slice,
b.mbytes,
a.rows,
a.num_values,
a.minvalue,
a.maxvalue
from
(select
a.db_id,
a.id,
s.node,
s.slice,
a.name,
d.num_values,
d.minvalue,
d.maxvalue,
sum(rows) as rows
from stv_tbl_perm a
inner join stv_slices s on a.slice = s.slice
inner join (
select tbl, slice, sum(num_values) as num_values, min(minvalue) as minvalue, max(maxvalue) as maxvalue
from svv_diskusage
where col = 0
group by 1, 2) d on a.id = d.tbl and a.slice = d.slice
group by 1, 2, 3, 4, 5, 6, 7, 8
) as a
join pg_class as pgc on pgc.oid = a.id
left join pg_user use on (pgc.relowner = use.usesysid)
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
-- leave out system schemas
and pgn.nspowner > 1
join pg_database as pgdb on pgdb.oid = a.db_id
join
(select
tbl,
slice,
count(*) as mbytes
from stv_blocklist
group by tbl, slice
) b on a.id = b.tbl
and a.slice = b.slice
order by mbytes desc, a.db_id, a.name, a.node;
Upvotes: 0
Reputation: 2828
Adding owner and a schema filter to the above query:
select
cast(use.usename as varchar(50)) as owner,
trim(pgdb.datname) as Database,
trim(pgn.nspname) as Schema,
trim(a.name) as Table,
b.mbytes,
a.rows
from
(select
db_id,
id,
name,
sum(rows) as rows
from stv_tbl_perm a
group by db_id, id, name
) as a
join pg_class as pgc on pgc.oid = a.id
left join pg_user use on (pgc.relowner = use.usesysid)
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
-- leave out system schemas
and pgn.nspowner > 1
join pg_database as pgdb on pgdb.oid = a.db_id
join
(select
tbl,
count as mbytes
from stv_blocklist
group by tbl
) b on a.id = b.tbl
order by mbytes desc, a.db_id, a.name;
Upvotes: 1
Reputation: 8647
Use queries from this presentation: http://www.slideshare.net/AmazonWebServices/amazon-redshift-best-practices
Analyze disk space usage for cluster:
select
trim(pgdb.datname) as Database,
trim(pgn.nspname) as Schema,
trim(a.name) as Table,
b.mbytes,
a.rows
from (
select db_id, id, name, sum(rows) as rows
from stv_tbl_perm a
group by db_id, id, name
) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (
select tbl, count(*) as mbytes
from stv_blocklist
group by tbl
) b on a.id = b.tbl
order by mbytes desc, a.db_id, a.name;
Analyze Table distribution between nodes:
select slice, col, num_values, minvalue, maxvalue
from svv_diskusage
where name = '__INSERT__TABLE__NAME__HERE__' and col = 0
order by slice, col;
Upvotes: 49