Reputation: 128
I want the latest updated folder from one of my HDFS directories.I was able to get the latest file in hdfs file system but not sure how to do it for HDFS one.I tried with shell script.
Upvotes: 1
Views: 3610
Reputation: 3421
With Hadoop 2.6, I could get it work with the following command:
hdfs dfs -ls -R ${DIR} | grep "^d" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8
where,
hdfs dfs -ls -R ${DIR}
: gives all dirs recursively
grep "^d"
: gives only directories
sort -k6,7
: sorts them by modification time
tail -1
: gives listing for last modified directory
tr -s ' '
: some formatting
cut -d' ' -f8
: gives only directory path
Example:
[user@nodeX]$ hdfs dfs -ls -R /tmp/a
drwxr-xr-x - hduser supergroup 0 2017-08-08 03:08 /tmp/a/b
drwxr-xr-x - hduser supergroup 0 2017-08-08 03:11 /tmp/a/b/c
drwxr-xr-x - hduser supergroup 0 2017-08-08 03:12 /tmp/a/b/c/CC
-rw-r--r-- 3 hduser supergroup 0 2017-08-08 03:12 /tmp/a/b/c/CC/f2.txt
drwxr-xr-x - hduser supergroup 0 2017-08-08 03:08 /tmp/a/b/c/d
drwxr-xr-x - hduser supergroup 0 2017-08-08 03:08 /tmp/a/b/c/d/e
-rw-r--r-- 3 hduser supergroup 6 2017-08-08 03:10 /tmp/a/b/c/f1.txt
Solution:
[user@nodeX]$ hdfs dfs -ls -R /tmp/a | grep "^d" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8
/tmp/a/b/c/CC
Upvotes: 4