Reputation: 5905
Using the Pig example from Datastax, you can load data from Cassandra by
cassandra_data = LOAD 'cassandra://PigDemo/Scores' USING CassandraStorage()
AS (name, columns: bag {T: tuple(score, value)});
Next you can for example compute aggregates by
total_scores = FOREACH cassandra_data GENERATE name, COUNT(columns.score) as counts,
LongSum(columns.score) as total;
After reading the Pig reference manual, it is not obvious to me how i can rewrite/extend above code to produce a relation that I can store back into Cassandra. It should have the format
(<row_key>,{(<column_name1>,<value1>),(<column_name2>,<value2>)})
In our case
(name,{('counts',counts),('total',total)})
I have unsuccessfully attempted using AS and specifying a schema, and I tried to do it by using an additional GROUP statement:
grouped = GROUP total_scores by name;
cass_in = FOREACH grouped GENERATE group, total_scores.(co,total);
However, I feel there must be a straight-forward way that I am missing. Any help is appreciated.
Upvotes: 1
Views: 956
Reputation: 6443
Use the TOBAG() and TOTUPLE() UDFs (since 0.8)
FOREACH grouped GENERATE group, TOBAG(TOTUPLE('counts', total_scores.counts), TOTUPLE('total', total_scores.total));
Upvotes: 1