Reputation: 915
I have a docker image for hadoop. (in my case it is https://github.com/kiwenlau/hadoop-cluster-docker, but the question applies to any hadoop docker image)
I am running the docker container as below..
sudo docker run -itd --net=hadoop --user=root -p 50070:50070 \
-p 8088:8088 -p 9000:9000 --name hadoop-master --hostname hadoop-master \
kiwenlau/hadoop
I am writing data to the hdfs file system from java running in the host ubuntu machine.
FileSystem hdfs = FileSystem.get(new URI(hdfs://0.0.0.0:9000"), configuration)
hdfs.create(new Path("hdfs://0.0.0.0:9000/user/root/input/NewFile.txt")),
How should I mount the volume when starting docker such that the "NewFile1.txt" is persisted.
Which "path" inside the container corresponds to the HDFS path "/user/root/input/NewFile.txt" ?
Upvotes: 2
Views: 5133
Reputation: 191983
You should inspect the dfs.datanode.data.dir
in the hdfs-site.xml file to know where data is stored to the container filesystem
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///root/hdfs/datanode</value>
<description>DataNode directory</description>
</property>
Without this file/property, the default location would be in file:///tmp/hadoop-${user.name}/dfs/data
For docker,. mind that the default user that runs the processes is the root user.
You will also need to persist the namenode files, again seen from that XML file
Which "path" inside the container corresponds to the HDFS path "/user/root/input/NewFile.txt"
The container path holds the blocks of the HDFS file, not the whole file itself
Upvotes: 4