Jaya
Jaya

Reputation: 128

Get the last updated folder in HDFS

I want the latest updated folder from one of my HDFS directories.I was able to get the latest file in hdfs file system but not sure how to do it for HDFS one.I tried with shell script.

Upvotes: 1

Views: 3610

Answers (1)

PradeepKumbhar
PradeepKumbhar

Reputation: 3421

With Hadoop 2.6, I could get it work with the following command:

hdfs dfs -ls -R ${DIR} | grep "^d" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8

where,

hdfs dfs -ls -R ${DIR} : gives all dirs recursively

grep "^d" : gives only directories

sort -k6,7 : sorts them by modification time

tail -1 : gives listing for last modified directory

tr -s ' ' : some formatting

cut -d' ' -f8 : gives only directory path

Example:

[user@nodeX]$ hdfs dfs -ls -R /tmp/a 
drwxr-xr-x   - hduser supergroup          0 2017-08-08 03:08 /tmp/a/b
drwxr-xr-x   - hduser supergroup          0 2017-08-08 03:11 /tmp/a/b/c
drwxr-xr-x   - hduser supergroup          0 2017-08-08 03:12 /tmp/a/b/c/CC
-rw-r--r--   3 hduser supergroup          0 2017-08-08 03:12 /tmp/a/b/c/CC/f2.txt
drwxr-xr-x   - hduser supergroup          0 2017-08-08 03:08 /tmp/a/b/c/d
drwxr-xr-x   - hduser supergroup          0 2017-08-08 03:08 /tmp/a/b/c/d/e
-rw-r--r--   3 hduser supergroup          6 2017-08-08 03:10 /tmp/a/b/c/f1.txt

Solution:

[user@nodeX]$ hdfs dfs -ls -R /tmp/a | grep "^d" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8

/tmp/a/b/c/CC

Upvotes: 4

Related Questions