QuaffAPint
QuaffAPint

Reputation: 227

MongoDB Analytics Schema

New to Mongo, taking on the analytics quest and have a schema question.

I'm coming from SQL server, where I have 4 Tables...

stats_landing_table
-------------------
id
post_id
visit_date
country
browser
impressions

stats_geo_table
---------------
id
post_id
date
country
country_count

stats_browser_table
-------------------
id
post_id
date
browser
browser_count

stats_impressions_table
-----------------------
id
post_id
date
impression_count

With the landing table - I just throw all data into there each time - no updates, just inserts. On there I have a trigger that looks at the other tables and does the appropriate update or insert and then removes the entry fron the landing table.

So, I need FAST inserts/updates, with the ability to go back and display reports about each attribute around the post (country, browser, impressions, etc) based on a whole or on a particular timerange.

With Mongo, it comes down to whether I should embed or have separate documents, or I guess an entirely different schema. If I embed, I'm trying to figure out how I would update something like the browser table, while at the same time still checking for the country and updating impressions. It sounds like I would need a separate update for each thing (like, this vistor is using firefox from germany, while this one is firefox from finland - can't do that in a single query from what I see). The other option is to do different documents, but that's the same problem - I would need to do an update for each collection - that's a lot of network traffic vs the SQL server single call.

I'm open to any suggestions of how to design something like this that allows for quick performance update/inserts, while still allowing reporting.

Upvotes: 3

Views: 1248

Answers (1)

spf13
spf13

Reputation: 561

from what I understand of this situation, the three tables are summary tables where the unique combination of day, post_id and the given metric in the case of browser or country are then tallied.

I would skip the insert table all together as you don't keep it today and use it more or less as a queue.

With MongoDB the writes will all be written to memory so it is extremely fast. It can also be done asynchronously (fire and forget it style).

I would create a new document for each unique date and post_id. It would have two arrays one for browsers and one for countries where the key is the country code or browser and the value is the count. The document would also have a field for impression count.

Each visit would update a single document and would be easy to query.

The update would use the atomic operator "$inc" to increment the counts atomically on the server so no data would need to be pulled in to do the update. It would be a very small amount of data to transfer.

Does that answer the question?

Upvotes: 3

Related Questions