Reputation: 11
Help needed, please
I have downloaded the fsimage converted into a delimited csv file via OIV tool. I also created a hive table and inserted the csv file into it.
I am not so familiar with sql hence querying the data is difficult.
eg: Each record in a file is something like this:
/tmp/hive/ltonakanyan/9c01cc22-55ef-4410-9f55-614726869f6d/hive_2017-05-08_08-44-39_680_3710282255695385702-113/-mr-10000/.hive-staging_hive_2017-05-08_08-44-39_680_3710282255695385702-113/-ext-10001/000044_0.deflate|3|2017-05-0808:45|2017-05-0808:45|134217728|1|176|0|0|-rw-r-----|ltonakanyan|hdfs
/data/lz/cpi/ofz/zd/cbt_ca_verint/new_data/2017-09-27/253018001769667.xml | 3| 2017-09-2723:41| 2017-09-2817:09| 134217728| 1| 14549| 0| 0| -rw-r----- | bc55_ah_appid| hdfs
Table description is:
| hdfspath | string
| replication | int
| modificationtime | string
| accesstime | string
| preferredblocksize | int
| blockscount | int
| filesize | bigint
| nsquota | bigint
| dsquota | bigint
| permissionx | string
| userx | string
| groupx | string
I need to know how to query only /tmp , /data with filesize and then go to second level ( /tmp/hive ) ( /data/lz ) , subsequent levels with filesize
i created something like this: select substr(hdfspath, 2, instr(substr(hdfspath,2), '/')-1) zone, sum(filesize) from example group by substr(hdfspath, 2, instr(substr(hdfspath,2), '/')-1);
But its not giving the data..file sizes are all in bytes.
Upvotes: 0
Views: 769
Reputation: 11
select joinedpath, sumsize
from
(
select joinedpath,round(sum(filesize)/1024/1024/1024,2) as sumsize
from
(select concat('/',split(hdfspath,'\/')[1]) as joinedpath,accesstime,filesize, userx
from default.hdfs_meta_d
)t
where joinedpath != 'null'
group by joinedpath
)h
please check the query above, it can help you!
Upvotes: 1
Reputation: 8522
This job is failing due to heap memory error. Try to increase heap size before executing hdfs oiv command.
export HADOOP_OPTS="-Xmx4096m"
If the command is still failing you might need to move fsimage to a different machine/server which has more memory and increase heap memory using the above environment variable.
Upvotes: 0