Reputation: 346
I have run in to a slight problem. The story goes as follows:
I have a document archive system (written in PHP) which runs at multiple clients (23 at present). On their system they only have their documents. Every night, they all need to be 'synced' to a master database on site (central server). I have access to each MySQL database from the central server, so connecting to them is no problem.
I have a script that connects to the client database, selects all the entries from a table where the sync column = '0000-00-00 00:00:00' (default to indicate it wasnt synced). I would then iterate through each record, insert it to the central server, and set the sync time on the client database record to the time the script was executed. This works, but obviously has a large overhead with the multiple queries and I have just noticed the problems now.
Each client can generate up to 2000 - 3000 odd documents a day. With these large numbers it is taking way too long (1sec / 2documents).
Is there a better solution to my problem? Preferably a PHP scripted solution as I need to do logs to check if everything was succesful.
Thanks
EDIT: My current process is:
This is a script run on the central server. Now that I come to think of it, i can remove step 7 and have it part of step 5, but that wont reduce the processing time by much.
Upvotes: 0
Views: 3823
Reputation: 3427
Theres another possibility if you cant use sync framework -
Is it possible for you to distribute the load throughout the day, instead of end of day? Say, trigger synchronization every time 10 new documents come in or 10 edits are done? (this can be done if the synchronization is initiated from client side).
In case you want to take the sync logic to server side, you can consider using messaging queues to send notifications to server from clients, whenever client needs to synchronize. The server can then pull the data. You can use in-house service bus or on-demand platforms like azure appfabric/Amazon SQS for this.
Upvotes: 0
Reputation: 3427
I know you prefer a PHP based solution, but you might want to check out Microsoft Sync Framework -
http://msdn.microsoft.com/en-in/sync/default(en-us).aspx
This will necessitate the sync module to be written in .net, but there is a huge advantage in terms of sync logic and exception handling (network failure, sync conflicts, etc), which will reduce time for you.
The framework handles non-sql server databases as well, as long as there is a database connector for .net. Mysql should be supported quite easily - just take a sample from the following link -
http://code.msdn.microsoft.com/sync/Release/ProjectReleases.aspx?ReleaseId=4835
and adapt the same to mysql.
Upvotes: 0
Reputation: 48387
The basic method sounds OK - but taking 0.5 seconds to do one operation is ridiculously excessive - how much data are you pulling across the network? The entire image? Are you doing anything else in the operation? Is there an index on the sync column?
You could get a small benefit by doing an export of the un-synced data on the database:
1) mark all records available for sync with a transaction id in a new column
2) extract all records flagged in first step into a flat file
3) copy the file across the network
4) load the data into the master DB
5) if successful notify the origin server
6) origin server then sets the sync time for all records flagged with that transaction id
This would require 3 scripts - 2 on the origin server (one to prepare and send the data, one to flag as complete) and one on the replicated server to poll the data AND notify outcome.
But this is probably not going to make big inroads into the performance which seems absurdly high if you are only replicating meta-data about the image (rather than the image itself).
C.
Upvotes: 0
Reputation: 165271
I'd suggest using auto_increment_increment to keep all the ids unique over all of the servers. Then, all you need to do is a SELECT * FROM blah WHERE sync = '0000-00-00 00:00:00'
, and then generate the insert statements and execute them. You won't have to deal with any kind of conflict resolution for conflicting primary keys...
As for the long query times, you need to look at the size of your data. If each record is sizable (a few hundred kb +), it's going to take time...
One option may be to create a federated table for each child server's table. Then do the whole thing in SQL on the master. INSERT INTO master_table SELECT * FROM child_1_table WHERE sync = '0000-00-00 00:00:00'
... You get to avoid pulling all of the data into PHP. You can still run some checks to make sure everything went well, and you can still log since everything is still executed from PHP land...
Upvotes: 1