ABCD
ABCD

Reputation: 41

PiG + Cassandra + Hadoop

I have a Hadoop (2.7.2) setup over a Cassandra (3.7) Cluster. I have no problem with using Hadoop MapReduce. Similarly, I have no problem to create tables and keyspace in CQLSH. However, I have been trying to install PIG over hadoop, so as to access the tables in Cassandra. (Installation of PIG is as such fine) It is where I'm having trouble.

I have come across numerous websites, most are either for outdated versions of Cassandra or just plain vague. The one thing I gleaned from this website is that we can load access the cassandra tables in pig using CqlStorage / CqlNativeStorage. However, in the latest version, it seems this support has been removed (since 2015). Changes noted from Cassandra Git Tree Now my question is, are there any workarounds?

I would be running mapreduce jobs over cassandra tables, and use PiG for querying, mostly.

Thanks in Advance.

Upvotes: 2

Views: 349

Answers (1)

RussS
RussS

Reputation: 16576

All pig support was Deprecated in 2.2 and removed in 3.0. https://issues.apache.org/jira/browse/CASSANDRA-10542

So I think you are a bit out of luck here. You may be able to use old classes with modern C* but Pig is very niche right now. SparkSql is definitely the current favorite child (I may be biased since I work on the Spark + Cassandra Connector) and allows for very flexible querying of C* data.

Upvotes: 1

Related Questions