Reputation: 472
Background: We have an api which different api users are calling from different companies. We are logging every request in DataDog
. From DD, I am fetching those logs every minute, get the required data i.e. apiUserKey
, companyName
, dateAccessed
, apiVersion
etc. I am logging each of this request in a DynamoDB table called EpgApiStatistics
as shown in the image.
I can paginate through the list but I also need to maintain the total count for each api user and also the total count for each company(channel). For this purpose I have created two separate tables where I have the apiUserkey
as partition key and channelId
as parition key and then increment the total count.
My question is how can I calculate total count for a given time period or range of time. Let's say we want to see total api calls by a user from 10th to 17th November or month etc?
I have read about GSI (global secondary index) but I am confused how can I use them in my current setup. Thanks for your time.
Upvotes: 0
Views: 1493
Reputation: 25649
Repurpose the SortKey
column to include the log's timestamp:
SortKey
for each log entry becomes: Log#<Timestamp>
or Log#<Date>#<Random>
We can now query by day, month, and year date range for an individual ApiKey
ApiUserKey=<APIKey> and starts_with(SortKey, Log#2021-11)
For date range queries such as weeks or arbitrary from-to dates, you have to combine a query (overfetch based on >
or <
or starts_with
) with a filter expression.
To get logs in a range for all ApiKeys, add a Global Secondary Index, where you reverse the above keys. The timestamp becomes the GSI Partition Key, and the ApiKey the GSI Sort Key.
Same pattern. Add a second GSI, with the channel as GSI PK and the timestamp as GSI SK. You use this second index to query by channel with >, < or = on the timestamps.
Important to note: dynamodb makes a copy of all the indexed line items for each added index.
Upvotes: 1