Md. Parvez Alam
Md. Parvez Alam

Reputation: 4596

Filter on dynamodb table

I have a table say location with below struction

|Location(Hashkey-String)|TimeStamp(Range-Numeric)|#visiting_Person|

Now I want to get all the visiting persons between two timestamp

Table myTable= Table.LoadTable(client, "tableName");
QueryFilter filter = new QueryFilter();
filter.AddCondition("TimeStamp", QueryOperator.GreaterThan, 56545454);
Search search = myTable.Query(filter);

But it throw errors -

Query condition missed schema elelment: Location

From error it seems I have to add location (as it is hash key) in filter, but I dont have to filter on a particular location, I have to get any location within timestamp

I tried with setting timestamp as hashkey but I found that we can apply only equal operator on hashkey.

While debugging and changing with some other way i came across with other error i.e. keycondition can not be applied.


Well I am able to do the same with scan, But due to performance I do have to use Query.


Please advise how could i achieve my desired output.

Upvotes: 1

Views: 1890

Answers (2)

Johnny Wu
Johnny Wu

Reputation: 862

Query requires the primary keys of the table so, there isn't a way to Query just on the Range key.

Basically, since DynamoDB partitions based on hash key and Query operates within a partition, you will need to either:

  1. scan across hash keys (performance could be improved time-wise with a parallel scan)
  2. structure a hash key that can be used for the query. E.g. If the range is non-arbitrary you could improve performance by creating an attribute to "bucket" based on some less granular time and building a GSI on it:

    • Suppose the maximum range between the 2 timestamps is 1 day, you could:
    • make an attribute (say "TimeStampDate") that is at date level granularity rather than date+time timestamp granularity.
    • make a GSI with TimeStampDate as the Hash key
    • execute parallel queries on the dates that overlap the queried time range: "TimeStampDate=[start timestamp date] AND TimeStamp BETWEEN [start timestamp / end timestamp]", "TimeStampDate=[start timestamp date + 1 day] AND TimeStamp BETWEEN [start timestamp / end timestamp]", etc.

Upvotes: 1

oleksii
oleksii

Reputation: 35905

I know this won't work in Cassandra (which follows design of the Dynamo). The logic is that you must specify "any preceding keys", e.g. if you have

+---------+---------+--------+
|  Col1   |  Col2   | Value  |
+---------+---------+--------+
| Value 1 | Value 1 |    123 |
| Value 2 | Value 2 |    123 |
| Value 3 | Value 4 |    123 |
+---------+---------+--------+

You must specify Col1 before your can filter by Col2 providing the key is Col1 and Col2. In Cassandra Col1 will indicate on which db server in a cluster ring the record is located, and the Col2 will be a column key in that table on that particular server. Hope it's now clear as to why its important to have a cluster key specified in the search query.

To solve this you have two options:

  • Change your query to pass Location
  • Add another table that will have only two columns TimeStamp and VisitingPerson. The idea is that you just duplicate the data, which is fine in a nosql world

Upvotes: 0

Related Questions