keelar
keelar

Reputation: 6026

How does CqlConfigHelper.setOutputCql() work?

I am following the hadoop_cql3_word_count example in Cassandra and have questions with the following code segment:

    String query =
        "UPDATE " + KEYSPACE + "." + OUTPUT_COLUMN_FAMILY +
        " SET count_num = ? ";
    CqlConfigHelper.setOutputCql(job.getConfiguration(), query);

My questions are:

  1. What is the definition of the question mark (i.e., ?) in the above query? Does Cassandra process it in a way such that the question mark is replaced by some value?
  2. If I would like to update multiple columns of a row given its key, how should I modify the above update statement?

Thank you,

Upvotes: 2

Views: 797

Answers (1)

RussS
RussS

Reputation: 16576

The ? represents a slot for a variable in a prepared statement. When your MR job completes the values will be placed into the ?s in order.

If your MR results looked like (key=key1, 1) (key=key2, 2) (key=key3, 3)

Then the statements executed would be

Update Keyspace.columnfamily SET count_num = 1 where key=key1
Update Keyspace.columnfamily SET count_num = 2 where key=key2
Update Keyspace.columnfamily SET count_num = 3 where key=key3

To update multiple columns you just need to write a larger prepared statement and make sure your map reduce job is providing all of the appropriate values.

In the WC example

    keys.put("row_id1", ByteBufferUtil.bytes(partitionKeys[0]));
    keys.put("row_id2", ByteBufferUtil.bytes(partitionKeys[1]));
    ...
    keys.put("word", ByteBufferUtil.bytes(word.toString()));
    variables.add(ByteBufferUtil.bytes(String.valueOf(sum)));         

    ...
    context.write(keys, getBindVariables(word, sum));

This makes the reducer output look like ({row_id1=1,row_id2=3,word=pizza},4)

And the prepared statement will be executed like

UPDATE cql3_worldcount.output_words SET count_num = 4 where row_id1=1 AND row_id2=3 AND word=pizza ;

If I wanted a prepared statement with multiple columns it would look like

UPDATE test SET a =?,b=?,c=?,d=? (This gets filled in by the connector: where key=...)

With a real prepared statement we would also fill in the key as well, but here the connector to Cassandra will just use whatever mappings you have in your reducer output.

({key='mykey'},(1,2,3,4))
becomes
UPDATE test SET a =1,b=2,c=3,d=4 where key=mykey

For more information on prepared statements in general check SO Question about Prepared Statements in CQL

Upvotes: 1

Related Questions