Parallel get_range() phpcassa

Question

I am trying to made something similar to map reduce, but without hadoop.

I plane to use several PHP processes, each doing $cf->get_range($begin, $end) and to iterate every row.

But because of random partitioner, the data does not come sorted. This means I can not really select good $begin, $end variables, and will be difficult to start 30-40 processes in parallel.

Cassandra support get_range by token, but it is not exposed in phpcassa.

I have several possibilities, but do not like them because they do not seems unprofessional:

Put all keys on single row and use ColumnSlice() + multiget() after that.
Put all keys on single row but with their MD5 values. Then by MD5 value to get key, and to do get_range()
Doing similar stuff with secondary index
Import all keys in Redis.

Parallel get_range() phpcassa

Answers (0)

Related Questions