Reputation: 4649
I have an app that listens to a websocket and it stores usernames/userID's (Usernames are 1-20 bytes, UserID's are 17 bytes). This is not a big deal because it's only one document. However, every round they participate in, it pushes the round ID (24 bytes) and a 'score' decimal value (ex: 1190.0015239999999).
The thing is, there is no limit to how many rounds there are and I can't afford to pay so much per month for mongolab. What's the best way to handle this data?
My thoughts: - If there is a way to replace the _id: field in mongodb, I will replace it with the userID which is 17 bytes long. Not sure if I can do that though.
Store user data with timestamps and remove OLD data that has a score value less than 200.
Cut off user names that are more than 10 characters.
Completely remove Round ID's (Or replace the _id field with roundId). (Won't work since there are multiple roundID's in each document)
Round the decimal value to two places.
Remove Round ID's after 30 days
tl;dr
Need to store data efficiently < 500 mb in mongo lab
Documents consists of username(1-20 characters), userid(17 characters), round(Object Array) = [{round Id(24 characters), score(1190.0015239999999)}].
Thanks in advance!
Edit:
Document Schema:
userID: {type: String},
userName: {type: String},
rounds: [{roundID: String, score: String}]
Upvotes: 0
Views: 240
Reputation: 20703
Modelling 1:n relationships as embedded document is not the best except for very rare cases. This is because there is a 16MB size limit for BSON documents at the time of this writing.
A better (read more scalable and efficient approach) is to do use document references.
First, you need your player data, of course. Here is an example:
{
_id: "SomeUserId",
name: "SomeName"
}
There is no need for an extra userId
field since each document needs to have a _id
field with unique values anyway. Contrary to popular belief, this fields value does not have to be an ObjectId. So we already reduced the size you need for your player data by 1/3, if I am not mistaken.
Next, the results of each round:
{
_id: {
round: "SomeString",
player: "SomeUserId"
},
score: 5,
createdAt: ISODate("2015-04-13T01:03:04.0002Z")
}
A few things are to note here. First and foremost: Do NOT use strings to record values. Even grades should rather be stored as corresponding numerical values. Otherwise you can not get averages and alike. I'll show more of that later. We are using a compound field for _id
here, which is perfectly valid. Furthermore, it will give us a free index optimizing a few of the most likely queries, like "How did player X score in round Y?"
db.results.find({"_id.player":"X","_id.round":"Y"})
or "What where the results of round Y?"
db.results.find({"_id.round":"Y"})
or "What we're the scores of Player X in all rounds?"
db.results.find({"_id.player":"X"})
However, by not using a string to save the score, even some nifty stats become rather cheap, for example "What was the average score of round Y?"
db.results.aggregate(
{ $match: { "_id.round":"Y" } },
{ $group: { "round":"$_id.round", "averageScore": {$avg:"$score"} }
)
or "What is the average score of each player in all rounds?"
db.results.aggregate(
{ $group: { "player: "$_id.player", "averageAll": {$avg:"$score"} }
)
While you could do these calculation in your application, MongoDB can do them much more efficiently since the data does not have to be send to your app prior to processing it.
Next, for the data expiration. We have a createdAt
field, of type ISODate. Now, we let MongoDB take care of the rest by creating a TTL index
db.results.ensureIndex(
{ "createdAt":1 },
{ expireAfterSeconds: 60*60*24*30}
)
So all in all, this should be pretty much the most efficient way of storing and expiring your data, while improving scalability in the same time.
Upvotes: 1
Reputation: 2135
So currently you are storing three data points in the array for each record.
_id: false will prevent mongoose from automatically creating an id for the document. If you don't need roundID, then you can use the following which only stores one data point in the array:
round[{_id:false, score:String}]
Otherwise if roundID actually has meaning, use the following which stores two data points in the array:
round[{_id:false, roundID: string, score:String}]
Lastly, if you just need an ID for reference purposes, use the following, which will store two data points in the array - a random id and the score:
round[{score:String}]
Upvotes: 0