FeoJun
FeoJun

Reputation: 153

How to calculate size of the one record in hive?

I need to understand how much memory my table will use. I know the arithmetic mean of records. I know what types of records have. But I can't understand how to calculate mean/max size of the record. Is this possible to calculate it if I get mean/max size of every column of record?

Upvotes: 1

Views: 2189

Answers (2)

leftjoin
leftjoin

Reputation: 38335

get mean/max size of every column

Also you can analyze table and fetch stats for column.

Analyze table:

ANALYZE TABLE tablename COMPUTE STATISTICS FOR COLUMNS;

Use DESCRIBE FORMATTED:

desc formatted  tablename  column_name

It will print these figures:

data_type, min, max, num_nulls, distinct_count, avg_col_len, max_col_len, num_trues, num_falses

Unfortunately, it works for single column at a time and you need to repeat describe command to get all columns stats.

Upvotes: 1

Koushik Roy
Koushik Roy

Reputation: 7387

I can give you average(mean) of each row in a hive table. It is possible to calculate max/min size of each row but its going to bit difficult.
You can calculate average size using - total size in KB / count of table.

show tblproperties   tab ("totalSize")  ; --result is 1117098374
select 1117098374/(1024*count(*)) row_sz_kb from tab ;

To calculate individual row size, you need to sum-
space consumed by data type of each column * length of the data in the column.

This isnt easy since space consumed can be tricky for different data types. But i have seen in my system, if you store 100 in a column and 999 in same column, their space consumed will vary.
You can follow below link to bytes consumed by each data type but i would take them with a grain of salt and validate myself. http://hadooptutorial.info/hive-data-types-examples/

Upvotes: 0

Related Questions