Reputation: 272
When trying to run PIG against a CQL3 created Cassandra Schema,
-- This script simply gets a row count of the given column family
rows = LOAD 'cassandra://Keyspace1/ColumnFamily/' USING CassandraStorage();
counted = foreach (group rows all) generate COUNT($1);
dump counted;
I get the following Error.
Error: Column family 'ColumnFamily' not found in keyspace 'KeySpace1'
I understand that this is by design, but I have been having trouble finding the correct method to load CQL3 tables into PIG.
Can someone point me in the right direction? Is there a missing bit of documentation?
Upvotes: 1
Views: 505
Reputation: 16576
The best way to access Cql3 Tables in Pig is by using the CqlStorage Handler
The syntax is similar to what you have a above
row = Load 'cql://Keyspace/ColumnFamily/' Using CqlStorage()
More info In the Dev Blog Post
Upvotes: 0
Reputation: 161
As e90jimmy said, its supported in Cassandra 1.2.8, but we have a issue when using counter column type. This was fixed by Alex Liu but due to regression problem in 1.2.7 the patch doesn't go ahead:
https://issues.apache.org/jira/browse/CASSANDRA-5234
To correct this, wait until 2.0 become production ready or download the source, apply the patch from the above link by yourself and rebuild the cassandra .jar. Worked for me by now...
Upvotes: 0
Reputation: 1791
Per this https://github.com/alexliu68/cassandra/pull/3, it appears that this fix is planned for the 1.2.6 release of Cassandra. It sounds like they're trying to get that out in the reasonably near future, but of course there's no certain ETA.
Upvotes: 0
Reputation: 14153
As you mention this is by design because if thrift was updated to allow for this it would compromise backwards computability. Instead of creating keyspaces and column families using CQL (I'm guessing you used cqlsh) try using the C* CLI.
Take a look at these issues as well:
Upvotes: 0