Reputation: 2685

Cassandra secondary index get_indexed_slices timing out

I am using Cassandra 0.8 with 2 secondary indexes for columns like "DeviceID" and "DayOfYear". I have these two indexes in order to retrieve data for a device within a range of dates. Whenever I get a date filter, I convert this into DayOfYear and search using indexed slices using .net Thrift API. Currently I cannot upgrade the DB as well.

My problem is I usually do not have any issues retrieving rows using the get_indexed_slices query for the current date (using current day of year). But whenever I query for yesterday's day of year (which is one of the indexed column), I get a time out for the first time I make the query. And most of the times, it returns when I query the second time and 100% during the third time.

Both these columns are created as double data type in the column family and I generally get 1 record per minute. I have 3 nodes in the cluster and the nodetool reports suggest that the nodes are up and running, though the load distribution report from nodetool looks like this.

Starting NodeTool Address DC Rack Status State Load Owns xxx.xx.xxx.xx datacenter1 rack1 Up Normal 7.59 GB 51.39% xxx.xx.xxx.xx datacenter1 rack1 Up Normal 394.24 MB 3.81% xxx.xx.xxx.xx datacenter1 rack1 Up Normal 4.42 GB 44.80% and my configuration in YAML is as below.

hinted_handoff_enabled: true max_hint_window_in_ms: 3600000 # one hour hinted_handoff_throttle_delay_in_ms: 50 partitioner: org.apache.cassandra.dht.RandomPartitioner commitlog_sync: periodic commitlog_sync_period_in_ms: 120000 flush_largest_memtables_at: 0.75 reduce_cache_sizes_at: 0.85 reduce_cache_capacity_to: 0.6 concurrent_reads: 32 concurrent_writes: 24 sliced_buffer_size_in_kb: 64 rpc_keepalive: true rpc_server_type: sync thrift_framed_transport_size_in_mb: 15 thrift_max_message_length_in_mb: 16 incremental_backups: true snapshot_before_compaction: false column_index_size_in_kb: 64 in_memory_compaction_limit_in_mb: 64 multithreaded_compaction: false compaction_throughput_mb_per_sec: 16 compaction_preheat_key_cache: true rpc_timeout_in_ms: 50000 index_interval: 128

Is there something I may be missing? Are there any problems in the config?

Upvotes: 1

Answers (3)

Dean Hiller

Reputation: 20200

If you come from a relational model, playOrm is just as fast and you can be relational on a noSQL store BUT you just need to partition your extremely large tables. IF you do that, you can then use "scalable JQL" to do your stuff

@NoSqlQuery(name="findJoinOnNullPartition", query="PARTITIONS t(:partId) select t FROM TABLE as t INNER JOIN t.security as s where s.securityType = :type and t.numShares = :shares")

IT also has the @ManyToOne, @OneToMany, etc. etc. annotations for a basic ORM though some things work differently in noSQL but a lot is the similar.

Upvotes: 1

Muthu

Reputation: 2685

I finally solved my problem in a different way. In fact I realized the problem is with my data model.

The problem comes because we we come from a RDBMS background. I restructured the data model a little and now, I get responses faster.

Upvotes: -1

Sisso

Reputation: 3077

Duplicate your data in another column family where the key is your search data. Row slice are mutch faster

Personally I never got to use secondary index in production environments. Or I had problems with timeout, or the speed of data retrieve by secondary index was lower that the amount of data inserted. I think that it is related with not sequentially reading data and HD seek time.

Upvotes: 2

Cassandra secondary index get_indexed_slices timing out

Answers (3)

Related Questions