Reputation: 1812
I don't have any specific query but design question. I am new to spark/streaming hence forgive me if I am asking dumb question. Please delete it if question is inappropriate for this forum.
So basically we have requirement where we have to process huge amount of data every hour and produce o/p for reporting in kibana (elastic search). Lets suppose we have two data model as shown below. DataModel-1 represent the hash tag and userid of people who tweeted with that hash. Second data Model DataModel-2 contain zip and users how are in that zip. DataModel-1 data is stream data and we get almost 40K events per second. DataModel-2 don't change that frequently. In output we need data through which we can see a trend of tag for given zip. Like in given time zip how many users are tweeting with given tag.
I have below questions
DataModel-1 [{ hash: #IAMHAPPY, users: [123,134,4566,78899] }]
DataModel-2 [{ zip: zip1 users: [123,134] },{ zip: zip2 users: [4566,78899] }]
Report Data Model [ { zip: zip1, hash: [#IAMHAPPY] }, { zip: zip2, hash: [#IAMHAPPY] } ]
Upvotes: 0
Views: 112
Reputation: 388
My opinions are below:
Upvotes: 0
Reputation: 1110
Upvotes: 1