Reputation: 71
I have a stream of user data being ingested by [Azure Event Hubs] (https://learn.microsoft.com/en-us/azure/event-hubs/).
I am aggregating that data using [Azure Stream Analytics] (https://learn.microsoft.com/en-us/azure/stream-analytics/) by the COUNT(*) of views for a given productId.
I then update the productId's view count in CosmosDB.
In a prototype, I ran a query in a Stream Analytics job like this:
SELECT
productId, COUNT(*) AS views
INTO
cosmosDbOutput
FROM
eventHubInput
TIMESTAMP BY moment
GROUP BY productId, System.Timestamp()
This query upserted the views of a product for all time.
I ran a query in Cosmos to retrieve the most popular views:
SELECT TOP 10 * FROM c
ORDER BY c.views DESC
What I'd like to do is only maintain the views that a product has seen in the last 30 days.
I see that Stream Analytics jobs have [windowing functions] (https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions) that have a max time of 7 days.
Given that limit on windowing functions, is there a way to write a query that aggregates over 7+ days?
If not, is there a different approach in terms of tooling or storage I should consider?
Upvotes: 0
Views: 580
Reputation: 18387
I assume your past data won`t change, so there's no need to use Stream Analytics for it. Once you calculate the count, you can persist it on a SQL Database or Table Services (Cosmos / Storage) and create a materialized view.
more info about materialized view: https://learn.microsoft.com/en-us/azure/architecture/patterns/materialized-view
Upvotes: 1