Nick
Nick

Reputation: 10539

Parallel get_range() phpcassa

I am trying to made something similar to map reduce, but without hadoop.

I plane to use several PHP processes, each doing $cf->get_range($begin, $end) and to iterate every row.

But because of random partitioner, the data does not come sorted. This means I can not really select good $begin, $end variables, and will be difficult to start 30-40 processes in parallel.

Cassandra support get_range by token, but it is not exposed in phpcassa.

I have several possibilities, but do not like them because they do not seems unprofessional:

  1. Put all keys on single row and use ColumnSlice() + multiget() after that.
  2. Put all keys on single row but with their MD5 values. Then by MD5 value to get key, and to do get_range()
  3. Doing similar stuff with secondary index
  4. Import all keys in Redis.

Upvotes: 2

Views: 210

Answers (0)

Related Questions