Reputation: 2344
I need to copy a folder from local file system to HDFS. I could not find any example of moving a folder(including its all subfolders) to HDFS
$ hadoop fs -copyFromLocal /home/ubuntu/Source-Folder-To-Copy HDFS-URI
Upvotes: 92
Views: 323809
Reputation: 1163
you can use java multithread to copy it as well,
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileStatus;
import java.io.IOException;
import java.io.OutputStream;
import java.net.URI;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class HdfsMultiThreadFileCopy {
private static final int THREAD_COUNT = 10;
public static void main(String[] args) throws InterruptedException {
String sourceDir = "hdfs://namenode:8020/path/to/source/directory";
String targetDir = "hdfs://namenode:8020/path/to/target/directory";
Configuration conf = new Configuration();
ExecutorService executorService = Executors.newFixedThreadPool(THREAD_COUNT);
try {
FileSystem fs = FileSystem.get(URI.create(sourceDir), conf);
FileStatus[] fileStatuses = fs.listStatus(new Path(sourceDir));
for (FileStatus status : fileStatuses) {
if (status.isFile()) {
executorService.submit(() -> {
try {
copyFile(fs, status.getPath(), new Path(targetDir + "/" + status.getPath().getName()));
} catch (IOException e) {
System.err.println("Failed to copy file: " + status.getPath() + " due to " + e.getMessage());
}
});
} else if (status.isDirectory()) {
// Recursively traverse subdirectories
traverseAndCopyDirectory(fs, status.getPath(), new Path(targetDir));
}
}
} catch (IOException e) {
System.err.println("Failed to traverse directory: " + sourceDir + " due to " + e.getMessage());
}
executorService.shutdown();
executorService.awaitTermination(1, TimeUnit.DAYS);
}
private static void copyFile(FileSystem fs, Path source, Path target) throws IOException {
if (!fs.exists(target.getParent())) {
fs.mkdirs(target.getParent());
}
try (OutputStream out = fs.create(target, true)) {
fs.copyToLocalFile(source, new Path("/tmp/tempfile"));
fs.copyFromLocalFile(new Path("/tmp/tempfile"), target);
fs.delete(new Path("/tmp/tempfile"), false); // Clean up temporary file
}
}
private static void traverseAndCopyDirectory(FileSystem fs, Path sourceDir, Path targetDir) {
try {
FileStatus[] fileStatuses = fs.listStatus(sourceDir);
for (FileStatus status : fileStatuses) {
if (status.isFile()) {
copyFile(fs, status.getPath(), new Path(targetDir + "/" + status.getPath().getName()));
} else if (status.isDirectory()) {
traverseAndCopyDirectory(fs, status.getPath(), new Path(targetDir + "/" + status.getPath().getName()));
}
}
} catch (IOException e) {
System.err.println("Failed to traverse directory: " + sourceDir + " due to " + e.getMessage());
}
}
}
Upvotes: 0
Reputation:
using the following commands -
hadoop fs -copyFromLocal <local-nonhdfs-path> <hdfs-target-path>
hadoop fs -copyToLocal <hdfs-input-path> <local-nonhdfs-path>
Or you also use spark FileSystem library to get or put hdfs file.
Hope this is helpful.
Upvotes: 1
Reputation: 35404
hdfs dfs -put <localsrc> <dest>
Checking source and target before placing files into HDFS
[cloudera@quickstart ~]$ ll files/
total 132
-rwxrwxr-x 1 cloudera cloudera 5387 Nov 14 06:33 cloudera-manager
-rwxrwxr-x 1 cloudera cloudera 9964 Nov 14 06:33 cm_api.py
-rw-rw-r-- 1 cloudera cloudera 664 Nov 14 06:33 derby.log
-rw-rw-r-- 1 cloudera cloudera 53655 Nov 14 06:33 enterprise-deployment.json
-rw-rw-r-- 1 cloudera cloudera 50515 Nov 14 06:33 express-deployment.json
[cloudera@quickstart ~]$ hdfs dfs -ls
Found 1 items
drwxr-xr-x - cloudera cloudera 0 2017-11-14 00:45 .sparkStaging
Copy files HDFS using -put
or -copyFromLocal
command
[cloudera@quickstart ~]$ hdfs dfs -put files/ files
Verify the result in HDFS
[cloudera@quickstart ~]$ hdfs dfs -ls
Found 2 items
drwxr-xr-x - cloudera cloudera 0 2017-11-14 00:45 .sparkStaging
drwxr-xr-x - cloudera cloudera 0 2017-11-14 06:34 files
[cloudera@quickstart ~]$ hdfs dfs -ls files
Found 5 items
-rw-r--r-- 1 cloudera cloudera 5387 2017-11-14 06:34 files/cloudera-manager
-rw-r--r-- 1 cloudera cloudera 9964 2017-11-14 06:34 files/cm_api.py
-rw-r--r-- 1 cloudera cloudera 664 2017-11-14 06:34 files/derby.log
-rw-r--r-- 1 cloudera cloudera 53655 2017-11-14 06:34 files/enterprise-deployment.json
-rw-r--r-- 1 cloudera cloudera 50515 2017-11-14 06:34 files/express-deployment.json
Upvotes: 48
Reputation: 21
Navigate to your "/install/hadoop/datanode/bin" folder or path where you could execute your hadoop commands:
To place the files in HDFS: Format: hadoop fs -put "Local system path"/filename.csv "HDFS destination path"
eg)./hadoop fs -put /opt/csv/load.csv /user/load
Here the /opt/csv/load.csv is source file path from my local linux system.
/user/load means HDFS cluster destination path in "hdfs://hacluster/user/load"
To get the files from HDFS to local system: Format : hadoop fs -get "/HDFSsourcefilepath" "/localpath"
eg)hadoop fs -get /user/load/a.csv /opt/csv/
After executing the above command, a.csv from HDFS would be downloaded to /opt/csv folder in local linux system.
This uploaded files could also be seen through HDFS NameNode web UI.
Upvotes: 1
Reputation: 81
To copy a folder file from local to hdfs, you can the below command
hadoop fs -put /path/localpath /path/hdfspath
or
hadoop fs -copyFromLocal /path/localpath /path/hdfspath
Upvotes: 2
Reputation: 3990
If you copy a folder from local then it will copy folder with all its sub folders to HDFS.
For copying a folder from local to hdfs, you can use
hadoop fs -put localpath
or
hadoop fs -copyFromLocal localpath
or
hadoop fs -put localpath hdfspath
or
hadoop fs -copyFromLocal localpath hdfspath
Note:
If you are not specified hdfs path then folder copy will be copy to hdfs with the same name of that folder.
To copy from hdfs to local
hadoop fs -get hdfspath localpath
Upvotes: 32
Reputation: 31
You can use :
1.LOADING DATA FROM LOCAL FILE TO HDFS
Syntax:$hadoop fs –copyFromLocal
EX: $hadoop fs –copyFromLocal localfile1 HDIR
2. Copying data From HDFS to Local
Sys: $hadoop fs –copyToLocal < new file name>
EX: $hadoop fs –copyToLocal hdfs/filename myunx;
Upvotes: 3
Reputation: 6855
You could try:
hadoop fs -put /path/in/linux /hdfs/path
or even
hadoop fs -copyFromLocal /path/in/linux /hdfs/path
By default both put
and copyFromLocal
would upload directories recursively to HDFS.
Upvotes: 118