Reputation: 2685
I am using Cassandra 0.8 with 2 secondary indexes for columns like "DeviceID" and "DayOfYear". I have these two indexes in order to retrieve data for a device within a range of dates. Whenever I get a date filter, I convert this into DayOfYear and search using indexed slices using .net Thrift API. Currently I cannot upgrade the DB as well.
My problem is I usually do not have any issues retrieving rows using the get_indexed_slices query for the current date (using current day of year). But whenever I query for yesterday's day of year (which is one of the indexed column), I get a time out for the first time I make the query. And most of the times, it returns when I query the second time and 100% during the third time.
Both these columns are created as double data type in the column family and I generally get 1 record per minute. I have 3 nodes in the cluster and the nodetool reports suggest that the nodes are up and running, though the load distribution report from nodetool looks like this.
Starting NodeTool
Address DC Rack Status State Load Owns
xxx.xx.xxx.xx datacenter1 rack1 Up Normal 7.59 GB 51.39%
xxx.xx.xxx.xx datacenter1 rack1 Up Normal 394.24 MB 3.81%
xxx.xx.xxx.xx datacenter1 rack1 Up Normal 4.42 GB 44.80%
and my configuration in YAML is as below.
hinted_handoff_enabled: true
max_hint_window_in_ms: 3600000 # one hour
hinted_handoff_throttle_delay_in_ms: 50
partitioner: org.apache.cassandra.dht.RandomPartitioner
commitlog_sync: periodic
commitlog_sync_period_in_ms: 120000
flush_largest_memtables_at: 0.75
reduce_cache_sizes_at: 0.85
reduce_cache_capacity_to: 0.6
concurrent_reads: 32
concurrent_writes: 24
sliced_buffer_size_in_kb: 64
rpc_keepalive: true
rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
thrift_max_message_length_in_mb: 16
incremental_backups: true
snapshot_before_compaction: false
column_index_size_in_kb: 64
in_memory_compaction_limit_in_mb: 64
multithreaded_compaction: false
compaction_throughput_mb_per_sec: 16
compaction_preheat_key_cache: true
rpc_timeout_in_ms: 50000
index_interval: 128
Is there something I may be missing? Are there any problems in the config?
Upvotes: 1
Views: 517
Reputation: 20200
If you come from a relational model, playOrm is just as fast and you can be relational on a noSQL store BUT you just need to partition your extremely large tables. IF you do that, you can then use "scalable JQL" to do your stuff
@NoSqlQuery(name="findJoinOnNullPartition", query="PARTITIONS t(:partId) select t FROM TABLE as t INNER JOIN t.security as s where s.securityType = :type and t.numShares = :shares")
IT also has the @ManyToOne, @OneToMany, etc. etc. annotations for a basic ORM though some things work differently in noSQL but a lot is the similar.
Upvotes: 1
Reputation: 2685
I finally solved my problem in a different way. In fact I realized the problem is with my data model.
The problem comes because we we come from a RDBMS background. I restructured the data model a little and now, I get responses faster.
Upvotes: -1
Reputation: 3077
Duplicate your data in another column family where the key is your search data. Row slice are mutch faster
Personally I never got to use secondary index in production environments. Or I had problems with timeout, or the speed of data retrieve by secondary index was lower that the amount of data inserted. I think that it is related with not sequentially reading data and HD seek time.
Upvotes: 2