Cassandra Hadoop - is it possible to read and write to the same column family

Question

Using Cassandra 1.1, is it possible to have a Hadoop job that reads from Column Family X, and "updates" to it at the same time. That is, specify X as the Input Column Family and then

in the map step, update to the same CF (e.g. via Hector).
or if #1 is not possible, update to the same CF in the reduce step (directly via Hector, or alternatively by specifying the CF as the output column family).

What we are trying to do is this: we have (potentially very wide) rows that we will be reading in. In the map() method, we iterate through the columns of that row, and as each column is processed, we no longer need the column, so we plan to "expire" it by updating it in Cassandra with TTL = 1 sec.

If it's not possible or advisable to do that in the map step, then we are prepared to do that in the reduce step. However, we prefer do it in the map step, since doing it in the reduce step means that we would need to provide the reduce() method with enough info to identify the row+col we are trying to expire. And this would mean that our map step would need to include this info in its output, something we are trying to avoid if possible.

So again, is it possible to do this using either #1 or #2 ?

Cassandra Hadoop - is it possible to read and write to the same column family

Answers (1)

Related Questions