nicq
nicq

Reputation: 2314

Practical usage of Hadoop in project

I'm going to use Hadoop in my new project. The project concept is like on the following image: enter image description here

User has device which produces some data (logs). User can get data from device as a file and upload it to web app/hadoop. I'm going to build web app using Ruby on Rails.

I know some basics of Hadoop (HDFS, Mappers, Reducers), but I don't know how to use Hadoop in practical way. The project is now only a concept, because I would like to get some tips first and then adjust project's components to collected tips.

My major considerations are about:

  1. Web app should be stored on the same location as Hadoop? (the same server/cloud/service provider)
  2. How to upload files? Can be in the web app form to upload files? Or is it better to create desktop application to upload file (possible file size: 100MB - 1GB)?
  3. If it would be desktop application, it's better to send data directly to Hadoop or somehow through web application?
  4. Can you point me some useful framework/tool/API/resources for uploading from external resources (my web app or desktop application)?
  5. How to use in right way the Hadoop data? Let's assume that user's file comes to Hadoop. I know that I can run Mappers and Reducers on this file what produces output file (or puts some data to HBase - I'm right?). To get this data I need proper output file or get some "SELECT" from HBase, am I right? Does Hadoop contain any trigger to send info to external web app when job is done?

I appreciate every hint in this topic.

Upvotes: 0

Views: 361

Answers (1)

Oleksandr Pryimak
Oleksandr Pryimak

Reputation: 1581

  1. Do not run your webapp on the same machines. It is better to use a dedicated containers|machines for your Hadoop cluster
  2. HDFS (Hadoop filesystem) has an API to read|write. For example, there is a WebHDFS
  3. It is always better to send thought your web app because that way you could authentificate the client properly
  4. I do not get this. In my opinion uploading is easy and one do not need any libraries to archive this
  5. Do not query data from HDFS directly. Export it to some other storage afterwards. If you insist on using it directly there is no big problem. Just use WebHDFS.

Upvotes: 1

Related Questions