How to measure table space on disk in RedShift / ParAccel

I have a table in RedShift. How can I see how many disk-space it uses?

Upvotes: 23

Answers (4)

Diego

Reputation: 36166

I know this question is old and already has an answer accepted but I must point out that the answer is wrong. What the query is outputting there as "mb" is actually the "number of blocks". The answer would be correct only if the block size is 1MB (which is the default).

If the block size is different (in my case for example is 256K), you have to multiply the number of blocks by its size in bytes. I suggest the following change to your query where I multiply the number of blocks by the block size in bytes (262144 bytes) and then divide by (1024*1024) to output the total in megabytes:

select
    trim(pgdb.datname) as Database,
    trim(pgn.nspname) as Schema,
    trim(a.name) as Table,
    b.mbytes as previous_wrong_value,
    (b.mbytes * 262144)::bigint/(1024*1024) as "Total MBytes", 
    a.rows
from (
    select db_id, id, name, sum(rows) as rows
    from stv_tbl_perm a
    group by db_id, id, name
) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (
    select tbl, count(blocknum) as mbytes
    from stv_blocklist
    group by tbl
) b on a.id = b.tbl
order by mbytes desc, a.db_id, a.name;

Upvotes: 14

Pete Griffiths

Reputation: 11

Just thought I would expand on this as I am facing an issue of uneven distribution. I've added some links and fields to enable analysis of space by node and slice. Also added are max/min values and number of values per slice for column 0.

select
 cast(use.usename as varchar(50)) as owner, 
 trim(pgdb.datname) as Database,
 trim(pgn.nspname) as Schema,
 trim(a.name) as Table,
 a.node,
 a.slice,
 b.mbytes,
 a.rows,
 a.num_values,
 a.minvalue,
 a.maxvalue
from 
 (select 
   a.db_id,
   a.id, 
   s.node,
   s.slice,
   a.name,
   d.num_values,
   d.minvalue,
   d.maxvalue,
   sum(rows) as rows
  from stv_tbl_perm a
  inner join stv_slices s on a.slice = s.slice
  inner join (
    select tbl, slice, sum(num_values) as num_values, min(minvalue) as minvalue, max(maxvalue) as maxvalue
    from svv_diskusage
    where col = 0
    group by 1, 2) d on a.id = d.tbl and a.slice = d.slice
  group by 1, 2, 3, 4, 5, 6, 7, 8
 ) as a
 join pg_class as pgc on pgc.oid = a.id
 left join pg_user use on (pgc.relowner = use.usesysid)
 join pg_namespace as pgn on pgn.oid = pgc.relnamespace 
   -- leave out system schemas
   and pgn.nspowner > 1
 join pg_database as pgdb on pgdb.oid = a.db_id
 join 
  (select 
    tbl,
    slice,
    count(*) as mbytes
   from stv_blocklist
   group by tbl, slice
 ) b on a.id = b.tbl
    and a.slice = b.slice
order by mbytes desc, a.db_id, a.name, a.node;

Upvotes: 0

mike_pdb

Reputation: 2828

Adding owner and a schema filter to the above query:

select
 cast(use.usename as varchar(50)) as owner, 
 trim(pgdb.datname) as Database,
 trim(pgn.nspname) as Schema,
 trim(a.name) as Table,
 b.mbytes,
 a.rows
from 
 (select 
   db_id,
   id, 
   name,
   sum(rows) as rows
  from stv_tbl_perm a
  group by db_id, id, name
 ) as a
 join pg_class as pgc on pgc.oid = a.id
 left join pg_user use on (pgc.relowner = use.usesysid)
 join pg_namespace as pgn on pgn.oid = pgc.relnamespace 
   -- leave out system schemas
   and pgn.nspowner > 1
 join pg_database as pgdb on pgdb.oid = a.db_id
 join 
  (select 
    tbl,
    count as mbytes
   from stv_blocklist
   group by tbl
 ) b on a.id = b.tbl
order by mbytes desc, a.db_id, a.name;

Upvotes: 1

Tomasz Tybulewicz

Reputation: 8647

Use queries from this presentation: http://www.slideshare.net/AmazonWebServices/amazon-redshift-best-practices

Analyze disk space usage for cluster:

select
    trim(pgdb.datname) as Database,
    trim(pgn.nspname) as Schema,
    trim(a.name) as Table,
    b.mbytes,
    a.rows
from (
    select db_id, id, name, sum(rows) as rows
    from stv_tbl_perm a
    group by db_id, id, name
) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (
    select tbl, count(*) as mbytes
    from stv_blocklist
    group by tbl
) b on a.id = b.tbl
order by mbytes desc, a.db_id, a.name;

Analyze Table distribution between nodes:

select slice, col, num_values, minvalue, maxvalue
from svv_diskusage
where name = '__INSERT__TABLE__NAME__HERE__' and col = 0
order by slice, col;

Upvotes: 49

How to measure table space on disk in RedShift / ParAccel

Answers (4)

Related Questions