Fatima
Fatima

Reputation: 31

What are the right tools for me for indexing and processing big data?

I'm trying to index and store big data, and I'm a bit confused on what tools to use. Let me start by saying I'm a novice here and have only theoretical knowledge on the topic. I want to:

1) use Hadoop (definitely)

2) extract log data from flat files using three different PCs

3) transform the data to structured form and load in HDFS for indexing and mapreduce.

My questions are:

a) In trying to index three fields, is it possible to map-index-map-index-map-index-reduce? if not, how is indexing done? If possible to explain in sequence (e.g index-map-reduce)

b) What are the right tools to use from extraction to storage?

c) Can Hadoop be used for a simple search, or another tool such as lucene/solr must be used?

d) Must data be converted into structured form, e.g using PDI, before going through the MapReduce phase?

Upvotes: 2

Views: 330

Answers (2)

alekya reddy
alekya reddy

Reputation: 934

I suggest using elastic search or solar for indexing the big data.

Upvotes: 0

Amar
Amar

Reputation: 3845

Well if you are looking to index some data stored in hadoop then Cloudera Search is the perfect use case for you. Link: http://www.cloudera.com/content/cloudera/en/documentation/cloudera-search/v1-latest/Cloudera-Search-User-Guide/csug_introducing.html

I currently use it at Goibibo.com for indexing log data. You can use it for indexing data in real time as well as in map reduce mode. Internally it uses Solr to index and perfectly fits your use case. You can also expose the indexed collections through Hue.

Upvotes: 0

Related Questions