Reputation: 254
I have a "Vote" table in my database which is growing in size everyday, currently at around 100 million rows. For internal analytics / insights I used to have a rake task which would compute a few basic metrics, like the number of votes made daily in the past few days. It's just a COUNT with a where clause on the date "created_at".
This rake task was doing fine until I deleted the index on "created_at" because it seems that it had a negative impact on the app performance for all the other user-facing queries that didn't need this index, especially when inserting a new row.
Currently I don't have a lot of insights as to what is going on in my app and in this table. However I don't really want to add indexes on such a large table if it's only for my own use.
What else can I try ?
Upvotes: 1
Views: 537
Reputation: 6682
Alternately, you could sidestep the Vote
table altogether and keep an external tally.
Every time a vote is cast, a separate tally class that keeps a running count of votes cast will be invoked. There will be one tally record per day. A tally record will have an integer representing the number of votes cast on that day.
Each increment call to the tally class will find a tally record for the current date (today), increment the vote count, and save the record. If no record exists, one will be created and incremented accordingly.
For example, let's have a class called VoteTally
with two attributes: a date (date), and a vote count (integer), no timestamps, no associations. Here's what the model will look like:
class VoteTally < ActiveRecord::Base
def self.tally_up!
find_or_create_by_date(Date.today).increment!(:votes)
end
def self.tally_down!
find_or_create_by_date(Date.today).decrement!(:votes)
end
def self.votes_on(date)
find_by_date(date).votes
end
end
Then, in the Vote
model:
class Vote < ActiveRecord::Base
after_create :tally_up
after_destroy :tally_down
# ...
private
def tally_up ; VoteTally.tally_up! ; end
def tally_down ; VoteTally.tally_down! ; end
end
These methods will get vote counts:
VoteTally.votes_on Date.today
VoteTally.votes_on Date.yesterday
VoteTally.votes_on 3.days.ago
VoteTally.votes_on Date.parse("5/28/13")
Of course, this is a simple example and you will have to adapt it to suit. This will result in an extra query during vote casting, but it's a hell of a lot faster than a where
clause on 100M records with no index. Minor inaccuracies are possible with this solution, but I assume that's acceptable given the anecdotal nature of daily vote counts.
Upvotes: 1
Reputation: 12998
If the index has really an impact on the write performance, and it's only a few persons which run statistics now and then, you might consider another general approach:
You could separate your "transaction processing database" from your "reporting database".
You could update your reporting database on a regular basis, and create reporting-only indexes only there. What is more queries regarding reports will not conflict with transaction-oriented traffic, and it doesn't matter how long they run.
Of course, this increases a certain delay, and it increases system complexity. On the other hand, if you roll-forward your reporting database on a regular basis, you can ensure that your backup scheme actually works.
Upvotes: 0
Reputation: 78463
It's just a COUNT with a where clause on the date "created_at".
In that case the only credible index you can use is the one on created_at
...
If write performance is an issue (methinks it's unlikely...) and you're using a composite primary key, clustering the table using that index might help too.
Upvotes: 0