Reputation: 830
I'm running image analysis algorithms on Apache-Spark using Python.
Using Matplotlib, the final output that includes images (numpy 2D array) and plots (using subplot) needs to be saved in general image foramt such as jpeg, png, tiff, etc. on HDFS.
Like below code, I'd want each executor to run RDD and save image files. Is there a way to save files on hdfs from each executor? Please share any ideas is you have any.
Thanks!
ax1 = plt.subplot(131)
plt.subplots_adjust(wspace=0.4)
im = plt.imshow(map1, interpolation='nearest')
divider = make_axes_locatable(ax1)
cax = divider.append_axes("right", size="2.5%", pad=0.1)
cb = plt.colorbar(im,cax=cax)
ax1.set_title("Test1" )
ax2 = plt.subplot(132)
plt.imshow(map2, cmap='gray', interpolation='nearest')
ax2.set_title("Test2")
ax3 = plt.subplot(133)
plt.imshow(map3, cmap='gray', interpolation='nearest')
ax3.set_title("Test3")
plt.savefig(filepathname, bbox_inches = 'tight', pad_inches=0)
Upvotes: 0
Views: 3110
Reputation: 10450
In order to use the following function without any changes:
plt.savefig(filepathname, bbox_inches = 'tight', pad_inches=0)
You'll need to mount the HDFS as a folder of your local machine.
You'll need to install hadoop-hdfs-fuse (http://www.cloudera.com/documentation/archive/cdh/4-x/4-7-1/CDH4-Installation-Guide/cdh4ig_topic_28.html)
The following commands assume Ubuntu as your machine (see above link for other distributions)
sudo apt-get install hadoop-hdfs-fuse
sudo mkdir -p <mount_point>
hadoop-fuse-dfs dfs://<name_node_hostname>:<namenode_port> <mount_point>
if the apt-get install hadoop-hdfs-fuse will fail, you can do the following and repeat the above 3 lines again:
wget http://archive.cloudera.com/cdh5/one-click-install/trusty/amd64/cdh5-repository_1.0_all.deb
sudo dpkg -i cdh5-repository_1.0_all.deb
sudo apt-get update
for more info: http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_cdh5_install.html#topic_4_4_1__p_44
Upvotes: 1