user915831
user915831

Reputation: 241

MySQL replication vs other techniques

Im having a really hard time trying to go down the RIGHT road in a project.

I'm a one man band with a tight budget. 2 dedicated servers MySQL 5 / php5

I'm using server 1 to consume a lot of data from various feeds. The server/software is running 24/7 generating a huge database.

Server 2 - holds a copy Of the database with a web frontend

I don't have any experience of MySQL replication. I've been researching and from what I can tell the slaves are updated right after the master.

I want to have a very speedy website so that's why the processing is done on server 1, whilst sever 2 simply selects data.

If MySQL replication is mimicking server 1 then surely this is going slow down server 2 and have the opposite of the desired effect.

What I thought might best suit this scenario is to write a script to automate the process.

Server 2 has 2 databases. One for live one for processing.

The script ascertains which database is live and instead uses the other one.

It's drops any tables in it. The script dumps the database from server 1. Installs it on server 2's newly emptied database. The script changes the websites config file to utilise the new database.

The process can be repeated over and over.

Whilst the database install will be large it can happen its entirety at night and should mean no down time.

Is this better than doing MySQL replication ? I would welcome advice.

Upvotes: 4

Views: 402

Answers (4)

dlo
dlo

Reputation: 1583

You really haven't provided enough info on what you're aiming to do, but here's my best understanding: server1 is fetching data (using bandwidth) and processing it in some way, (using processing power and I/O); server2 is serving live info to users that is based on the post-processed data. Availability for server2 is more important than for server1, and a problem on server1 should not affect server2's operations.

If the there's a significant difference between the raw data that server1 is fetching and the 'finished' data for use on server2, perhaps with some temporary data being produced along the way, just have server1 do its work, and use some kind of a script to periodically bring post-processed data from server1 to server2. Perhaps post-processed data is smaller than the raw stuff that server1 is working on?

If server1 is not really doing much processing, just fetching of data and insertion into db, then replication might be reasonable way to move data from #1 to #2.

An in-between approach would be to only replicate certain post-processed tables, so server1 can do its work in other tables in mysql, and when the final product is being inserted into the replicated table, it will automatically appear on server2.

Have fun.

Upvotes: 0

derobert
derobert

Reputation: 51197

Its hard to believe that a database dump/load cycle would be faster than replication. Especially row-based (non-query) replication. Replication can be lagged (by running SLAVE STOP SQL_THREAD on the slave) if you don't want it during peak times (but of course you must have sufficient non-peak times to catch up). (Remember that MySQL has three replication modes: statement, row, and mixed. Statement-based does the exact same update load on the slaves, row-based just sends the rows that changed, and should be fairly cheap CPU-wise)

Either all your slaves are fast enough to apply changes, and still have plenty of I/O bandwidth and CPU time to handle SELECTs, or no number of slaves will help. Its possible some other method (e.g., direct copying of data files) might be faster, but more fragile, and really you're talking some relatively minor gains. If you can't handle the update load, your choice with MySQL is to shard (split so each server is only responsible for part of the data) or buy faster hardware.

But ultimately, this is all taking shots in the dark. You can fairly easily change from replication, to rsync, to some insane scheme involving drbd, to whatever, that really only affects your database layer, maybe only the database itself. You need actual benchmarks—actual data—to make decisions like this. I will tell you that as a general rule, properly-designed large OLTP databases run out of I/O bandwidth first.

I'd suggest start with what's easy. And that'd be a single database server, or built-in replication. Keep in mind that sharding may be necessary at some point.

Actually, there is probably one question you want to answer fairly early: Do you really want to go with MySQL? Consider PostgreSQL.

Upvotes: 4

Matthieu Napoli
Matthieu Napoli

Reputation: 49703

You say "If MySQL replication is mimicking server 1 then surely this is going slow down server 2 and have the opposite of the desired effect."

I don't think this is going to slow down the server. Have you tried it and measured any performance difference? I really think this is the best way to go for you, unless you clearly measure a performance impact because of the replication.

Upvotes: 0

Joshua
Joshua

Reputation: 5514

A high volume of inserts can most certainly impact front end performance, but the answer for your scenario depends on very specifically how your processing engine impacts resources. There are certain combinations of settings that will allow high performance on selects while inserting data constantly. It depends on your specific duty cycle, storage engine, indexing scheme, etc.

You start by thoroughly understanding table locking http://dev.mysql.com/doc/refman/5.0/en/table-locking.html This is a must!

Then you can explore features like INSERT DELAYED http://dev.mysql.com/doc/refman/5.0/en/insert-delayed.html

And optimize your indices (as few as possible) to reduce the impact of each insert http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html

Since it sounds like your requirements are driven by lots of data growth (inserts), if you can't get the performance you need from a single instance, replication probably won't help. In which case you should go for the nightly load scenario.

We have a similar use case, and we do nightly batch loads, with replication for backup/failover purposes only.

Upvotes: 0

Related Questions