Reputation: 4211
I am in the middle of developing an app which harvests tweets, Facebook statuses and Facebook photos for a user. Currently the user sets out exactly when and to they want this harvest to occur and a spider pulls the data during this period. The when and to is stored in a MySQL db and my plan was to store all the tweets, status and photo meta-data in MongoDB (with the actual images on S3).
I was thinking I would just create one collection for each of the periods the user wants to harvest for and then store all the tweets etc from that period in that particular collection.
Does this seem like a reasonable approach?
Upvotes: 2
Views: 1102
Reputation: 45307
Does this seem like a reasonable approach?
What the #1 user query? Is it "find activity by period"? If users only ever want to "find by period", then this makes sense.
However, if users want an accumulated view, now you have to gather history for a user and merge it for display.
If you want both a "by this period" and an "accumulated", then I suggest simply stuffing all data into a single user object. It's easy to tag the individual actions with a "harvest run" and a "timestamp".
Mongo Details: MongoDB can handle individual documents up to about 4MB. Most recent versions up this to 8 or 16MB. If you're only using this space for text, please realize that this is a lot of text. A copy of war & peace is just over 3MBs. So you're talking about hundreds of pages of text in 4MB. With 8 or 16MB, you can probably store status updates & tweets for years on most people.
Note that MongoDB has GridFS for storing binary data (like image files), so you'll typically store just pointers to these in the User document.
Upvotes: 4