Ophir Yoktan
Ophir Yoktan

Reputation: 8449

Strange behaviour in hdfs moveFromLocal when directories exists

I'm trying to move files (tree structure) from the local file system to hdfs using moveFromLocal hdfs shell command.

If the destination sub directories don't exists, everything works fine. But if they exists (which is the general case - as files are added to existing directories) another level in the hierarchy is created

example:

The original structure on disk

$ find src
src
src/a
src/a/2
src/a/2/file1
src/a/1
src/a/1/file1
src/a/4
src/a/4/file1
src/a/3
src/a/3/file1
src/b
src/b/2
src/b/2/file1
src/b/1
src/b/1/file1
src/b/4
src/b/4/file1
src/b/3
src/b/3/file1

The move command

$hdfs dfs -moveFromLocal src/* /dst

The result (as expected)

$ hdfs dfs -ls  -R /dst
drwxr-xr-x   - root supergroup          0 2014-02-02 03:39 /dst/a
drwxr-xr-x   - root supergroup          0 2014-02-02 03:39 /dst/a/1
-rw-r--r--   3 root supergroup          0 2014-02-02 03:39 /dst/a/1/file1
drwxr-xr-x   - root supergroup          0 2014-02-02 03:39 /dst/a/2
-rw-r--r--   3 root supergroup          0 2014-02-02 03:39 /dst/a/2/file1
drwxr-xr-x   - root supergroup          0 2014-02-02 03:39 /dst/a/3
-rw-r--r--   3 root supergroup          0 2014-02-02 03:39 /dst/a/3/file1
drwxr-xr-x   - root supergroup          0 2014-02-02 03:39 /dst/a/4
-rw-r--r--   3 root supergroup          0 2014-02-02 03:39 /dst/a/4/file1
drwxr-xr-x   - root supergroup          0 2014-02-02 03:39 /dst/b
drwxr-xr-x   - root supergroup          0 2014-02-02 03:39 /dst/b/1
-rw-r--r--   3 root supergroup          0 2014-02-02 03:39 /dst/b/1/file1
drwxr-xr-x   - root supergroup          0 2014-02-02 03:39 /dst/b/2
-rw-r--r--   3 root supergroup          0 2014-02-02 03:39 /dst/b/2/file1
drwxr-xr-x   - root supergroup          0 2014-02-02 03:39 /dst/b/3
-rw-r--r--   3 root supergroup          0 2014-02-02 03:39 /dst/b/3/file1
drwxr-xr-x   - root supergroup          0 2014-02-02 03:39 /dst/b/4
-rw-r--r--   3 root supergroup          0 2014-02-02 03:39 /dst/b/4/file1

The local files in the 2nd batch

$ find src
src
src/a
src/a/2
src/a/2/file2
src/a/1
src/a/1/file2
src/a/4
src/a/4/file2
src/a/3
src/a/3/file2
src/b
src/b/2
src/b/2/file2
src/b/1
src/b/1/file2
src/b/4
src/b/4/file1
src/b/3
src/b/3/file2

Moving the 2nd batch to hdfs

$ hdfs dfs -moveFromLocal src/* /dst

The 2nd batch on hdfs

note that all the "file2" are in a double hierarchy (a/a, instead of just a)

$ hdfs dfs -ls  -R /dst
drwxr-xr-x   - root supergroup          0 2014-02-02 03:42 /dst/a
drwxr-xr-x   - root supergroup          0 2014-02-02 03:39 /dst/a/1
-rw-r--r--   3 root supergroup          0 2014-02-02 03:39 /dst/a/1/file1
drwxr-xr-x   - root supergroup          0 2014-02-02 03:39 /dst/a/2
-rw-r--r--   3 root supergroup          0 2014-02-02 03:39 /dst/a/2/file1
drwxr-xr-x   - root supergroup          0 2014-02-02 03:39 /dst/a/3
-rw-r--r--   3 root supergroup          0 2014-02-02 03:39 /dst/a/3/file1
drwxr-xr-x   - root supergroup          0 2014-02-02 03:39 /dst/a/4
-rw-r--r--   3 root supergroup          0 2014-02-02 03:39 /dst/a/4/file1
drwxr-xr-x   - root supergroup          0 2014-02-02 03:42 /dst/a/a
drwxr-xr-x   - root supergroup          0 2014-02-02 03:42 /dst/a/a/1
-rw-r--r--   3 root supergroup          0 2014-02-02 03:42 /dst/a/a/1/file2
drwxr-xr-x   - root supergroup          0 2014-02-02 03:42 /dst/a/a/2
-rw-r--r--   3 root supergroup          0 2014-02-02 03:42 /dst/a/a/2/file2
drwxr-xr-x   - root supergroup          0 2014-02-02 03:42 /dst/a/a/3
-rw-r--r--   3 root supergroup          0 2014-02-02 03:42 /dst/a/a/3/file2
drwxr-xr-x   - root supergroup          0 2014-02-02 03:42 /dst/a/a/4
-rw-r--r--   3 root supergroup          0 2014-02-02 03:42 /dst/a/a/4/file2
drwxr-xr-x   - root supergroup          0 2014-02-02 03:42 /dst/b
drwxr-xr-x   - root supergroup          0 2014-02-02 03:39 /dst/b/1
-rw-r--r--   3 root supergroup          0 2014-02-02 03:39 /dst/b/1/file1
drwxr-xr-x   - root supergroup          0 2014-02-02 03:39 /dst/b/2
-rw-r--r--   3 root supergroup          0 2014-02-02 03:39 /dst/b/2/file1
drwxr-xr-x   - root supergroup          0 2014-02-02 03:39 /dst/b/3
-rw-r--r--   3 root supergroup          0 2014-02-02 03:39 /dst/b/3/file1
drwxr-xr-x   - root supergroup          0 2014-02-02 03:39 /dst/b/4
-rw-r--r--   3 root supergroup          0 2014-02-02 03:39 /dst/b/4/file1
drwxr-xr-x   - root supergroup          0 2014-02-02 03:42 /dst/b/b
drwxr-xr-x   - root supergroup          0 2014-02-02 03:42 /dst/b/b/1
-rw-r--r--   3 root supergroup          0 2014-02-02 03:42 /dst/b/b/1/file2
drwxr-xr-x   - root supergroup          0 2014-02-02 03:42 /dst/b/b/2
-rw-r--r--   3 root supergroup          0 2014-02-02 03:42 /dst/b/b/2/file2
drwxr-xr-x   - root supergroup          0 2014-02-02 03:42 /dst/b/b/3
-rw-r--r--   3 root supergroup          0 2014-02-02 03:42 /dst/b/b/3/file2
drwxr-xr-x   - root supergroup          0 2014-02-02 03:42 /dst/b/b/4
-rw-r--r--   3 root supergroup          0 2014-02-02 03:42 /dst/b/b/4/file1

EDIT

I understand that this behavior is by design... I'm open for alternative solutions that perform the same.

Upvotes: 1

Views: 736

Answers (2)

Evgeny Benediktov
Evgeny Benediktov

Reputation: 1399

This behavior is consistent (kind of) with mv on Unix - though its man page doesn't document it, mv will refuse to rename a directory to another directory if the target directory contains files:

[evgeny@dev1]\$ mv src/* dst/
mv: cannot move 'src/subsrc' to 'dst/subsrc': Directory not empty

Unfortunately you have to clean dst dir first: "hadoop fs -rmr dst".

Upvotes: 1

Ophir Yoktan
Ophir Yoktan

Reputation: 8449

org.apache.hadoop.fs.FileContext (a wrapper \ replacement of org.apache.hadoop.fs.FileSystem) has a cleaner API.

among other things, it's rename will (optionally) fail if the directory exists. this will not make the requested merge, but at least will raise an exception and wont create unwanted sub directories.

Upvotes: 0

Related Questions