Reputation: 3272
I'm new to Big data. I learned that HDFS is for storing more of structured data and HBase is for storing unstructured data. I'm having a REST API where I need to get the data and load it into the data warehouse (HDFS/HBase). The data is in JSON format. So which one would be better to load the data into? HDFS or HBase? Also can you please direct me to some tutorial to do this. I came across this about Tutorial with Streaming Data. But I'm not sure if this will fit my use case.
It would be of great help if you can guide me to a particular resource/ technology to solve this issue.
Upvotes: 1
Views: 855
Reputation: 1059
There is several questions you have to think about
Do you want to work with batch files or streaming ? It depends on the rate at which your REST API will be requested
For the Storage there is not just HDFS and Hbase, you have a lot of other solutions as Casandra, MongoDB, Neo4j. All depends on the way you want to use it (Random Acces VS Full Scan, Update with versioning VS writing new lines, Concurrency access). For example Hbase is good for random access, Neo4j for graph storage,... If you are receiving JSON files, MongoDB can be a god choice as it stores object as document.
What is the size of your data ?
Here is good article on questions to think about when you start a big data project documentation
Upvotes: 1