Alain
Alain

Reputation: 5905

Generate Cassandra friendly relation in Pig

Using the Pig example from Datastax, you can load data from Cassandra by

cassandra_data = LOAD 'cassandra://PigDemo/Scores' USING CassandraStorage()
   AS (name, columns: bag {T: tuple(score, value)});

Next you can for example compute aggregates by

total_scores = FOREACH cassandra_data GENERATE name, COUNT(columns.score) as counts,
   LongSum(columns.score) as total;

After reading the Pig reference manual, it is not obvious to me how i can rewrite/extend above code to produce a relation that I can store back into Cassandra. It should have the format

(<row_key>,{(<column_name1>,<value1>),(<column_name2>,<value2>)})

In our case

(name,{('counts',counts),('total',total)})

I have unsuccessfully attempted using AS and specifying a schema, and I tried to do it by using an additional GROUP statement:

grouped  = GROUP total_scores by name;
cass_in = FOREACH grouped GENERATE group, total_scores.(co,total);

However, I feel there must be a straight-forward way that I am missing. Any help is appreciated.

Upvotes: 1

Views: 956

Answers (1)

libjack
libjack

Reputation: 6443

Use the TOBAG() and TOTUPLE() UDFs (since 0.8)

FOREACH grouped GENERATE group, TOBAG(TOTUPLE('counts', total_scores.counts), TOTUPLE('total', total_scores.total));

Upvotes: 1

Related Questions