Reputation: 153
I need to understand how much memory my table will use. I know the arithmetic mean of records. I know what types of records have. But I can't understand how to calculate mean/max size of the record. Is this possible to calculate it if I get mean/max size of every column of record?
Upvotes: 1
Views: 2189
Reputation: 38335
get mean/max size of every column
Also you can analyze table and fetch stats for column.
Analyze table:
ANALYZE TABLE tablename COMPUTE STATISTICS FOR COLUMNS;
Use DESCRIBE FORMATTED:
desc formatted tablename column_name
It will print these figures:
data_type, min, max, num_nulls, distinct_count, avg_col_len, max_col_len, num_trues, num_falses
Unfortunately, it works for single column at a time and you need to repeat describe command to get all columns stats.
Upvotes: 1
Reputation: 7387
I can give you average(mean) of each row in a hive table. It is possible to calculate max/min size of each row but its going to bit difficult.
You can calculate average size using - total size in KB / count of table.
show tblproperties tab ("totalSize") ; --result is 1117098374
select 1117098374/(1024*count(*)) row_sz_kb from tab ;
To calculate individual row size, you need to sum-
space consumed by data type of each column * length of the data in the column.
This isnt easy since space consumed can be tricky for different data types.
But i have seen in my system, if you store 100 in a column and 999 in same column, their space consumed will vary.
You can follow below link to bytes consumed by each data type but i would take them with a grain of salt and validate myself.
http://hadooptutorial.info/hive-data-types-examples/
Upvotes: 0