user915831
user915831

Reputation: 241

Mysql replication - is it worth it?

Replication

I have an app that Is polling data from a large number of data feeds. It processes thousands of records per day and this number is ever increasing. The data is stored in Mysql. 

I then have a website that utilises this data.

I'm trying to build my environment with future in mind. 

 I thought of mysql replication so that the website can use it's own database on a different server and get bogged down by the thousands of write commands that are happening on the main database. 

I am having difficulty getting this setup, despite mysql reporting it's all working fine. 

I then started think - is there not a better way ? From what I understand mysql sends the write command to the slave database as the master. 

Does this not mean that what I am trying to avoid is just happening anyway? Does this mean that the slave database will suffer thousands of writes 

I am a one man band, doing this venture with my own money so I need to do this a cheapest way. I am getting a bit lost !

I have a dedicated server, A vps Using Php5, mysql 5 in a lamp stack.

I cannot begin to tell you how much I would appreciate some guidance!

Upvotes: 1

Views: 342

Answers (4)

bikedorkseattle
bikedorkseattle

Reputation: 981

You say you have an app "polling" your data from datafeeds. Does that mean you are doing full text searches? I'm making an assumption here in that you are batch processing date feeds and then querying that. If that is the case I'd offload all your fulltext queries to something like Solr. It actually isn't too time consuming to setup, depending on the size of your DB you can get away with running it on a fairly small VPS or on your dedicated, and best yet the difference is search speed is incredible. I've had full text mysql queries that would take 20 minutes to run be done in solr in under a second.

Just make sure you use a try statement in the event your solr instance goes down.

Upvotes: 0

PA.
PA.

Reputation: 29339

I would split this in three layers.

  1. Data Feed layer. Data read from the feeds is preprocessed and posted into a queue. This layer has a temporary queue that serves also as a temporary storage, a buffer to allow all data feed to post its data. I'd use a Message Queue System. It's fast and reliable.

  2. Data Store layer. This layer reads from the queue, maybe processes someway the data read, and stores the data in the database.

  3. Data Analysis layer. This is your "slave" database. It's a data warehouse. It periodically does ETL (extract, transform and load) data from the Data Store layer to this secondary database.

This layeread approach allows you isolate concerns (speed, reliability, security) and implementation details; and allows for future scalability.

Upvotes: 1

Furicane
Furicane

Reputation: 1183

Replication is literally what the word suggest - replicating queries on another machine. MySQL creates a log that's filled with queries that were used to create the dataset on the original machine (master) and sends it to the slave(s) that read the log and re-execute those queries.

Basically, what you want is to increase your write ratio. That's achievable trough using different engines, for example TokuDB is one of them (however it isn't free, but you are allowed to store 50gb of user data for free and use it).

What you want (for the moment) is fast HDD subsystem more than a monolithic write-scalable storage system. InnoDB is capable of achieving a lot of queries per second on properly configured machine with sufficient hardware. I am not sure about pricing, but SSD and 4-8 gigs of ram shouldn't be that expensive. As Marc. B said - until you reach millions of records per day, you don't have to worry about scaling reads and writes trough replication.

Upvotes: 0

Marc B
Marc B

Reputation: 360682

If the slaves are a 1:1 clone of the master, than all writes to the master MUST be propagated down to the slaves. Otherwise replication would be useless.

Thousands of records per day is actually very small. Assuming the same processing time for each, and doing 5000 records, you'd have 86400/5000 = 17.28 seconds per record. That's very minimal write overhead.

If you were doing millions of records a day, THEN you'd have a write bottleneck.

Upvotes: 3

Related Questions