Navneet Kumar
Navneet Kumar

Reputation: 3752

How to list only the file names in HDFS

I would like to know is there any command/expression to get only the file name in hadoop. I need to fetch only the name of file, when I do hadoop fs -ls it prints the whole path.

I tried below but just wondering if some better way to do it.

hadoop fs -ls <HDFS_DIR>|cut -d ' ' -f17 

Upvotes: 37

Views: 51411

Answers (7)

MichealKum
MichealKum

Reputation: 500

The following command will return filenames only:

hdfs dfs -stat "%n" my/path/*

:added at Feb 04 '21

Actually last few years I use

hdfs dfs -ls -d my/path/* | awk '{print $8}'

and

hdfs dfs -ls my/path | grep -e "^-" | awk '{print $8}'

Upvotes: 48

MichealKum
MichealKum

Reputation: 500

One more solution I use often. There are few related things:

  • list files and dirs only without Found x items with

hdfs dfs -ls -d mypath/*

  • keep full path only with

hdfs dfs -ls -d mypath/* | awk '{print $8}'

  • only file names

hdfs dfs -ls -d mypath/* | awk '{print $8}'| while read fn; do basename $fn; done

  • and in additional use path templates if necessary:

hdfs dfs -ls -d {my,his}path/*.{txt,doc}

Upvotes: 2

loneStar
loneStar

Reputation: 4010

 hadoop fs -ls  -C  /path/* | xargs -n 1 basename

Upvotes: 1

anirudh.vyas
anirudh.vyas

Reputation: 572

I hope this helps someone - with version 2.8.x+ (available in 3 as well) -

hadoop fs -ls  -C  /paths/

Upvotes: 30

Vinod ram
Vinod ram

Reputation: 95

The Below Command return only the File names in the Directory. Awk Splits the list by '/' and prints last field which would be the File name.

hdfs dfs -ls /<folder> | awk -F'/' '{print $NF}'

Upvotes: 0

Jakub Kotowski
Jakub Kotowski

Reputation: 7571

It seems hadoop ls does not support any options to output just the filenames, or even just the last column.

If you want get the last column reliably, you should first convert the whitespace to a single space, so that you can then address the last column:

hadoop fs -ls | sed '1d;s/  */ /g' | cut -d\  -f8

This will get you just the last column but files with the whole path. If you want just filenames, you can use basename as @rojomoke suggests:

hadoop fs -ls | sed '1d;s/  */ /g' | cut -d\  -f8 | xargs -n 1 basename

I also filtered out the first line that says Found ?x items

Note: beware that, as @felix-frank notes in the comments, that the above command will not correctly preserve file names with multiple consecutive spaces. Hence a more correct solution proposed by Felix:

hadoop fs -ls /tmp | sed 1d | perl -wlne'print +(split " ",$_,8)[7]'

Upvotes: 42

rojomoke
rojomoke

Reputation: 4015

Use the basename command, which strips any prefix ending in '/' from the string.

basename $(hadoop fs -ls)

Upvotes: 0

Related Questions