Reputation: 2344
I need to copy a folder from local file system to HDFS. I could not find any example of moving a folder(including its all subfolders) to HDFS
$ hadoop fs -copyFromLocal /home/ubuntu/Source-Folder-To-Copy HDFS-URI
Upvotes: 92
Views: 323809
Reputation: 1163
you can use java multithread to copy it as well,
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileStatus;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class HdfsMultiThreadFileCopy {
private static final int THREAD_COUNT = 10;
public static void main(String[] args) throws InterruptedException {
String sourceDir = "hdfs://namenode:8020/path/to/source/directory";
String targetDir = "hdfs://namenode:8020/path/to/target/directory";
Configuration conf = new Configuration();
ExecutorService executorService = Executors.newFixedThreadPool(THREAD_COUNT);
try {
FileSystem fs = FileSystem.get(URI.create(sourceDir), conf);
FileStatus[] fileStatuses = fs.listStatus(new Path(sourceDir));
for (FileStatus status : fileStatuses) {
if (status.isFile()) {
executorService.submit(() -> {
try {
copyFile(fs, status.getPath(), new Path(targetDir + "/" + status.getPath().getName()));
} catch (IOException e) {
System.err.println("Failed to copy file: " + status.getPath() + " due to " + e.getMessage());
} else if (status.isDirectory()) {
// Recursively traverse subdirectories
traverseAndCopyDirectory(fs, status.getPath(), new Path(targetDir));
} catch (IOException e) {
System.err.println("Failed to traverse directory: " + sourceDir + " due to " + e.getMessage());
executorService.awaitTermination(1, TimeUnit.DAYS);
private static void copyFile(FileSystem fs, Path source, Path target) throws IOException {
if (!fs.exists(target.getParent())) {
try (OutputStream out = fs.create(target, true)) {
fs.copyToLocalFile(source, new Path("/tmp/tempfile"));
fs.copyFromLocalFile(new Path("/tmp/tempfile"), target);
fs.delete(new Path("/tmp/tempfile"), false); // Clean up temporary file
private static void traverseAndCopyDirectory(FileSystem fs, Path sourceDir, Path targetDir) {
try {
FileStatus[] fileStatuses = fs.listStatus(sourceDir);
for (FileStatus status : fileStatuses) {
if (status.isFile()) {
copyFile(fs, status.getPath(), new Path(targetDir + "/" + status.getPath().getName()));
} else if (status.isDirectory()) {
traverseAndCopyDirectory(fs, status.getPath(), new Path(targetDir + "/" + status.getPath().getName()));
} catch (IOException e) {
System.err.println("Failed to traverse directory: " + sourceDir + " due to " + e.getMessage());
Upvotes: 0
using the following commands -
hadoop fs -copyFromLocal <local-nonhdfs-path> <hdfs-target-path>
hadoop fs -copyToLocal <hdfs-input-path> <local-nonhdfs-path>
Or you also use spark FileSystem library to get or put hdfs file.
Hope this is helpful.
Upvotes: 1
Reputation: 35404
hdfs dfs -put <localsrc> <dest>
Checking source and target before placing files into HDFS
[cloudera@quickstart ~]$ ll files/
total 132
-rwxrwxr-x 1 cloudera cloudera 5387 Nov 14 06:33 cloudera-manager
-rwxrwxr-x 1 cloudera cloudera 9964 Nov 14 06:33
-rw-rw-r-- 1 cloudera cloudera 664 Nov 14 06:33 derby.log
-rw-rw-r-- 1 cloudera cloudera 53655 Nov 14 06:33 enterprise-deployment.json
-rw-rw-r-- 1 cloudera cloudera 50515 Nov 14 06:33 express-deployment.json
[cloudera@quickstart ~]$ hdfs dfs -ls
Found 1 items
drwxr-xr-x - cloudera cloudera 0 2017-11-14 00:45 .sparkStaging
Copy files HDFS using -put
or -copyFromLocal
[cloudera@quickstart ~]$ hdfs dfs -put files/ files
Verify the result in HDFS
[cloudera@quickstart ~]$ hdfs dfs -ls
Found 2 items
drwxr-xr-x - cloudera cloudera 0 2017-11-14 00:45 .sparkStaging
drwxr-xr-x - cloudera cloudera 0 2017-11-14 06:34 files
[cloudera@quickstart ~]$ hdfs dfs -ls files
Found 5 items
-rw-r--r-- 1 cloudera cloudera 5387 2017-11-14 06:34 files/cloudera-manager
-rw-r--r-- 1 cloudera cloudera 9964 2017-11-14 06:34 files/
-rw-r--r-- 1 cloudera cloudera 664 2017-11-14 06:34 files/derby.log
-rw-r--r-- 1 cloudera cloudera 53655 2017-11-14 06:34 files/enterprise-deployment.json
-rw-r--r-- 1 cloudera cloudera 50515 2017-11-14 06:34 files/express-deployment.json
Upvotes: 48
Reputation: 21
Navigate to your "/install/hadoop/datanode/bin" folder or path where you could execute your hadoop commands:
To place the files in HDFS: Format: hadoop fs -put "Local system path"/filename.csv "HDFS destination path"
eg)./hadoop fs -put /opt/csv/load.csv /user/load
Here the /opt/csv/load.csv is source file path from my local linux system.
/user/load means HDFS cluster destination path in "hdfs://hacluster/user/load"
To get the files from HDFS to local system: Format : hadoop fs -get "/HDFSsourcefilepath" "/localpath"
eg)hadoop fs -get /user/load/a.csv /opt/csv/
After executing the above command, a.csv from HDFS would be downloaded to /opt/csv folder in local linux system.
This uploaded files could also be seen through HDFS NameNode web UI.
Upvotes: 1
Reputation: 81
To copy a folder file from local to hdfs, you can the below command
hadoop fs -put /path/localpath /path/hdfspath
hadoop fs -copyFromLocal /path/localpath /path/hdfspath
Upvotes: 2
Reputation: 3990
If you copy a folder from local then it will copy folder with all its sub folders to HDFS.
For copying a folder from local to hdfs, you can use
hadoop fs -put localpath
hadoop fs -copyFromLocal localpath
hadoop fs -put localpath hdfspath
hadoop fs -copyFromLocal localpath hdfspath
If you are not specified hdfs path then folder copy will be copy to hdfs with the same name of that folder.
To copy from hdfs to local
hadoop fs -get hdfspath localpath
Upvotes: 32
Reputation: 31
You can use :
Syntax:$hadoop fs –copyFromLocal
EX: $hadoop fs –copyFromLocal localfile1 HDIR
2. Copying data From HDFS to Local
Sys: $hadoop fs –copyToLocal < new file name>
EX: $hadoop fs –copyToLocal hdfs/filename myunx;
Upvotes: 3
Reputation: 6855
You could try:
hadoop fs -put /path/in/linux /hdfs/path
or even
hadoop fs -copyFromLocal /path/in/linux /hdfs/path
By default both put
and copyFromLocal
would upload directories recursively to HDFS.
Upvotes: 118