Reputation: 61
I am having a list of compressed datasets and I need the size of these datasets when they are uncompressed
I tried proc contents , attrn function but when applied these on compressed datasets I could get the size of the datasets in current status(compressed)
I searched online, I could see techniques to find the approximate size of the datasets and I could not find the exact size of the datasets using these techniques
Like the compressed dataset size is 62MB whereas the uncompressed dataset size is 629MB. Now I have only compressed dataset and I want to find the size of uncompressed dataset without uncompressing the data acutally
Is this possible. Please share your thoughts. Thanks in advance
Upvotes: 1
Views: 648
Reputation: 1
Find length of variable , total of that length will be length of each row. Multiply with number of row will give you aprox. size of uncompressed table.
E.g Table have variable x and variable y and total 1000000 observation . Length of x is 10 and length of y is 20 then one observation size will be 30 bytes.
So total size will be 30 * 1000000= 3000000 bytes = 3 MB
Remember that in uncompressed table Page and observation Overhead is less , so actual size will be little less than 3 MB.
Upvotes: 0
Reputation: 376
Run proc contents
and calculate sum(length) * nobs
, i.e. bytes per row times number of rows. The true size of the table is just slightly larger (by some constant amount of bytes I think; EDIT: it's not constant. But if you need approximate numbers, this approach will do).
Another option is to use the size of the compressed table and the compression ratio (you should see it in the log when you create/modify the table). Just dividing compressed table size by the percent.
Upvotes: 2
Reputation: 9569
I doubt it's possible to get an exact answer, but you should be able to produce a reasonably accurate estimate without too much work.
SAS datasets are compressed row-wise. Select a small representative sample of rows from your compressed dataset, making a new uncompressed dataset, find the size of it, and then scale by the inverse of the sample rate to estimate the size of the full dataset. This won't be exact, as some rows compress better than others, but you should be able to get a more accurate estimate with a larger sample.
Upvotes: 0