Reputation: 3540
Here is my sample snippet which I use to write file to hdfs
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import java.net.URISyntaxException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Progressable;
public class WriteFileToHDFS {
public static void main(String[] args) throws IOException, URISyntaxException
{
System.setProperty("hadoop.home.dir", "/");
System.setProperty("HADOOP_USER_NAME", "hdfs");
//1. Get the instance of COnfiguration
Configuration configuration = new Configuration();
//2. Create an InputStream to read the data from local file
InputStream inputStream = new BufferedInputStream(new FileInputStream("/Users/rabbit/Research/hadoop/sample_files/TAO.mp4"));
//3. Get the HDFS instance
FileSystem hdfs = FileSystem.get(new URI("hdfs://192.168.143.150:9000"), configuration);
//4. Open a OutputStream to write the data, this can be obtained from the FileSytem
OutputStream outputStream = hdfs.create(new Path("hdfs://192.168.143.150:9000/filestore/TAO.mp4"),
new Progressable() {
@Override
public void progress() {
System.out.println("....");
}
});
try
{
IOUtils.copyBytes(inputStream, outputStream, 4096, false);
}
finally
{
IOUtils.closeStream(inputStream);
IOUtils.closeStream(outputStream);
}
}
}
I expect this to be written as /data/hadoop-data/dn/current/blk_1073741869
instead it is written as /data/hadoop-data/dn/current/BP-1308070615-172.22.131.23-1533215887051/current/finalized/subdir0/subdir0/blk_1073741869
. I do not understand where BP-1308070615-172.22.131.23-1533215887051/current/finalized/subdir0/subdir0
- this path got generated?
How the path structure is defined while writing to data node in hadoop?
Upvotes: 2
Views: 386
Reputation: 3540
BP stands for "Block Pool", a collection of blocks which are belonging to a single HDFS namespace.
The next part is 1308070615, is a random generated integer.
The IP address 172.22.131.23 is the address of the NameNode that originally created the block pool.
The last part 1533215887051 is the creation time of the namespace.
Upvotes: 0
Reputation: 1166
The BP stands for "block pool", a collection of blocks belonging to a single HDFS namespace.
This is how hdfs manages data blocks, you can refer to this link to know every thing about it:
https://hortonworks.com/blog/hdfs-metadata-directories-explained/
Upvotes: 1