Reputation: 1939
I am currently starting a project titled "Cloud computing for time series mining algorithms using Hadoop". The data which I have is hdf files of size over a terabyte.In hadoop as I know that we should have text files as input for further processing (map-reduce task). So I have one option that I convert all my .hdf files to text files which is going to take a lot of time.
Or I find a way of how to use raw hdf files in map reduce programmes. So far I have not been successful in finding any java code which reads hdf files and extract data from them. If somebody has a better idea of how to work with hdf files I will really appreciate such help.
Thanks Ayush
Upvotes: 3
Views: 4741
Reputation: 523
SciMATE http://www.cse.ohio-state.edu/~wayi/papers/SciMATE.pdf is a good option. It is developed based on a variant of MapReduce, which has been shown to perform a lot of scientific applications much more efficiently than Hadoop.
Upvotes: 1
Reputation: 2724
If you do not find any java code and can do in other languages then you can use hadoop streaming.
Upvotes: 1
Reputation: 36
For your first option, you could use a conversion tool like HDF dump to dump HDF file to text format. Otherwise, you can write a program using Java library for reading HDF file and write it to text file.
For your second option, SciHadoop is a good example of how to read Scientific datasets from Hadoop. It uses NetCDF-Java library to read NetCDF file. Hadoop does not support POSIX API for file IO. So, it uses an extra software layer to translate POSIX call of NetCDF-java library to HDFS(Hadoop) API calls. If SciHadoop does not already support HDF files, you might go along a little harder path and develop a similar solution yourself.
Upvotes: 2
Reputation: 17489
Here are some resources:
Upvotes: 3