Reputation:
I have an issue reading a table containing >800k rows. I need to read the rows from top to bottom in order to process them.
I use Scala and Phantom for the purpose.
Here is how my table look.
CREATE TABLE raw (
id uuid PRIMARY KEY,
b1 text,
b2 timestamp,
b3 text,
b4 text,
b5 text
) WITH bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
So far I've tried to read the table using:
def getAllRecords : Future[Seq[row]] = select.fetch
or the more fancy Play Enumerator and combine it with a Iteratee
def getAllRecords : Enumerator = select.fetchEnumrator
Nothing of this works, it seems like cassandra/driver/my program always tries to read all records upfront, what am I missing here ?
Upvotes: 0
Views: 390
Reputation: 28511
Have you tried reviewing the code in the bigger read tests?
class IterateeBigReadPerformanceTest extends BigTest with ScalaFutures {
it should "read the correct number of records found in the table" in {
val counter: AtomicLong = new AtomicLong(0)
val result = TestDatabase.primitivesJoda.select
.fetchEnumerator run Iteratee.forEach {
r => counter.incrementAndGet()
}
result.successful {
query => {
info(s"done, reading: ${counter.get}")
counter.get() shouldEqual 2000000
}
}
}
}
This is not something that will read your records upfront. In fact we have tests than run more than one hour to guarantee sufficient GC pauses, no GC overhead, permgen/metaspace pressure remains within bounds, etc.
If anything has indeed changed it is only in error, but this should still work.
Upvotes: 1