covfefe
covfefe

Reputation: 2675

Command to find largest file in hadoop directory

I am trying to find the largest file in a given directory on a hadoop filesystem. I found this link: http://www.tecmint.com/find-top-large-directories-and-files-sizes-in-linux/, which showed the following command for finding the largest file:

find /home/tecmint/Downloads/ -type f -exec du -Sh {} + | sort -rh | head -n 5

But when I ran

hadoop fs -find [hadoop location] -type f -exec du -Sh {} + | sort -rh | head -n 5

I got find: Unexpected argument: -type.

I also ran hadoop fs -du -a | sort -n | head -n 1 but the result I was getting was not the largest file in the directory. Would appreciate any help.

Upvotes: 4

Views: 8443

Answers (1)

Hamza Zafar
Hamza Zafar

Reputation: 1360

In Linux you can run the following command to find the largest file in Desktop directory, remove the -r argument of sort if you want to find the file with smallest size!

du ~/Desktop/* | sort -n -r | head -n 1

For HDFS you can try the following command

hadoop fs -du <Path-in-HDFS> | sort -n -r | head -n 1

Upvotes: 4

Related Questions