Paul C
Paul C

Reputation: 8507

Cassandra and Pig integration - Is hadoop optional?

I'm trying to set up a trial cassandra + pig cluster. The cassandra wiki makes it sound like you need hadoop to integrate with pig.

but the readme in cassandra-src/contrib/pig makes it sound like you can run pig on cassandra without hadoop.

If hadoop is optional, what do you lose by not using it?

Upvotes: 4

Views: 791

Answers (2)

nickmbailey
nickmbailey

Reputation: 3684

Hadoop is only optional when you are testing things out. In order to do anything at any scale you will need hadoop as well.

Running without hadoop means you are running pig in local mode. Which basically means all the data is processed by the same pig process that you are running in. This works fine with a single node and example data.

When running with any significant amount of data or multiple machines you want to run pig in hadoop mode. By running hadoop task trackers on your cassandra nodes pig can take advantage of the benefits map reduce provides by distributing the workload and using data locality to reduce network transfer.

Upvotes: 6

ligerdave
ligerdave

Reputation: 772

It's optional. Cassandra has its own implementation of pig's LoadFunc and storeFunc which allow u to query and store.

Hadoop and Cassandra are different in many ways. It's hard to say what you lose without knowing what exactly u r trying to accomplish.

Upvotes: -1

Related Questions