Aleksandr Levchuk
Aleksandr Levchuk

Reputation: 3881

Hadoop fs lookup for block size?

In Hadoop fs how to lookup the block size for a particular file?

I was primarily interested in a command line, something like:

hadoop fs ... hdfs://fs1.data/...

But it looks like that does not exist. Is there a Java solution?

Upvotes: 17

Views: 33390

Answers (5)

seunggabi
seunggabi

Reputation: 1822

Try to code below

path=hdfs://a/b/c

size=`hdfs dfs -count ${path} | awk '{print $3}'`
echo $size

Upvotes: 1

Pety
Pety

Reputation: 39

For displaying the actual block size of the existing file within HDFS I used:

[pety@master1 ~]$ hdfs dfs -stat %o /tmp/testfile_64
67108864

Upvotes: 0

Eponymous
Eponymous

Reputation: 6811

The fsck commands in the other answers list the blocks and allow you to see the number of blocks. However, to see the actual block size in bytes with no extra cruft do:

hadoop fs -stat %o /filename

Default block size is:

hdfs getconf -confKey dfs.blocksize

Details about units

The units for the block size are not documented in the hadoop fs -stat command, however, looking at the source line and the docs for the method it calls we can see it uses bytes and cannot report block sizes over about 9 exabytes.

The units for the hdfs getconf command may not be bytes. It returns whatever string is being used for dfs.blocksize in the configuration file. (This is seen in the source for the final function and its indirect caller)

Upvotes: 48

Chris Zheng
Chris Zheng

Reputation: 1499

Seems hadoop fs doesn't have options to do this.

But hadoop fsck could.

You can try this

$HADOOP_HOME/bin/hadoop fsck /path/to/file -files -blocks

Upvotes: 14

Aleksandr Levchuk
Aleksandr Levchuk

Reputation: 3881

I think it should be doable with:

hadoop fsck /filename -blocks

but I get Connection refused

Upvotes: 1

Related Questions