Sudarshan Ts
Sudarshan Ts

Reputation: 11

How to analyze the contents fsimage via hive queries

Help needed, please

I have downloaded the fsimage converted into a delimited csv file via OIV tool. I also created a hive table and inserted the csv file into it.

I am not so familiar with sql hence querying the data is difficult.

eg: Each record in a file is something like this:

/tmp/hive/ltonakanyan/9c01cc22-55ef-4410-9f55-614726869f6d/hive_2017-05-08_08-44-39_680_3710282255695385702-113/-mr-10000/.hive-staging_hive_2017-05-08_08-44-39_680_3710282255695385702-113/-ext-10001/000044_0.deflate|3|2017-05-0808:45|2017-05-0808:45|134217728|1|176|0|0|-rw-r-----|ltonakanyan|hdfs

/data/lz/cpi/ofz/zd/cbt_ca_verint/new_data/2017-09-27/253018001769667.xml | 3| 2017-09-2723:41| 2017-09-2817:09| 134217728| 1| 14549| 0| 0| -rw-r----- | bc55_ah_appid| hdfs

Table description is:

| hdfspath | string
| replication | int
| modificationtime | string
| accesstime | string
| preferredblocksize | int
| blockscount | int
| filesize | bigint
| nsquota | bigint
| dsquota | bigint
| permissionx | string
| userx | string
| groupx | string

I need to know how to query only /tmp , /data with filesize and then go to second level ( /tmp/hive ) ( /data/lz ) , subsequent levels with filesize

i created something like this: select substr(hdfspath, 2, instr(substr(hdfspath,2), '/')-1) zone, sum(filesize) from example group by substr(hdfspath, 2, instr(substr(hdfspath,2), '/')-1);

But its not giving the data..file sizes are all in bytes.

Upvotes: 0

Views: 769

Answers (2)

jhonshonjs
jhonshonjs

Reputation: 11

select joinedpath, sumsize
from 
(
select joinedpath,round(sum(filesize)/1024/1024/1024,2) as sumsize
from
(select concat('/',split(hdfspath,'\/')[1]) as joinedpath,accesstime,filesize, userx 
from default.hdfs_meta_d
)t
where joinedpath != 'null'
group by joinedpath
)h

please check the query above, it can help you!

Upvotes: 1

SachinJose
SachinJose

Reputation: 8522

This job is failing due to heap memory error. Try to increase heap size before executing hdfs oiv command.

export HADOOP_OPTS="-Xmx4096m"

If the command is still failing you might need to move fsimage to a different machine/server which has more memory and increase heap memory using the above environment variable.

Upvotes: 0

Related Questions