Reputation: 7106
What are the possibilities to distribute data selectively?
I explain my question with an example. Consider a central database that holds all the data. This database is located in a certain geographical location.
Application A needs a subset of the information present in the central database. Also, application A may be located in a geographical location different (and maybe far) from the one where the central database is located.
So, I thought about creating a new database at the same location of application A that would contain a subset of information of the central database.
Which technology/product allow me to deploy such a configuration?
Thanks
Upvotes: 6
Views: 252
Reputation: 119
It's important to address the possibility of conflicts at the design phase anytime you are talking about replicating databases.
Moving on from that, SAP's Sybase Replication Server will allow you to do just that, either with Sybase database's or 3rd party databases.
In Sybase's world this is frequently called a corporate roll-up environment. There may be multiple geographically seperated databases each with a subset of data which they have primary control over. At the HQ, there is a server that contains all the various subsets in one repository. You can choose to replicate whole tables, or replicate based on values in individual rows/columns.
This keeps the databases in a loosely consistent state. Transaction rates, Geographic separation, and the latency that can be inherent to network will impact how quickly updates move from one database to another. If a network connection is temporarily down, Sybase Replication Server will queue up transaction, and send them as soon as the link comes back up, but the reliability and stability of the replication system will be affected by the stability of the network connection.
Again, as others have stated it's not cheap, but it's relatively straight forward to implement and maintain.
Disclaimer: I have worked for Sybase, and am still part of the SAP family of companies.
Upvotes: 0
Reputation: 24481
One big question that you didn't address is if Application A needs read-only access to the data or if it needs to be read-write.
The immediate concept that comes to mind when reading your requirements is sharding. In MySQL, this can be accomplished with partitioning. That being said, before you jump into partitions, make sure you read up on their pros and cons. There are instances where partitioning can slow things down if your indexes are not well chosen, or your partitioning scheme is not well thought out.
If your needs are read-only, then this should be a fairly simple solution. You can use MySQL in a Master-Slave context, and use App A off a slave. If you need read-write, then this becomes much more complex.
Depending on your write needs, you can split your reads to your slave, and your writes to the master, but that significantly adds complexity to your code structure (need to deal with multiple connections to multiple dbs). The advantage of this kind of layout is that you don't need to have complex DB infrastructure.
On the flip side, you can keep your code as is, and use a Master-Master replication in MySQL. Although not officially supported by Oracle, a lot of people have had success in this. A quick Google search will find you a huge list of blogs, howtos, etc. Just keep in mind that your code has to be properly written to support this (ex: you cannot use auto-increment fields for PKs, etc).
If you have cash to spend, then you can look at some of the more commercial offerings. Oracle DB and SQL Server both support this.
You can also use Block Based data replication, such as DRDB (and Mysql DRDB) to handle the replication between your nodes, but the problem you always will encounter is what happens if your link between the two nodes fails.
The biggest issue you will encounter is how to handle conflicting updates in 2 separate DB nodes. If your data is geographically dependent, then this may not be an issue for you.
Long story short, this is not an easy (or inexpensive) problem to resolve.
Upvotes: 1
Reputation: 310
Most major database software such as MySql and SQL server can do the job, but it is not a good model. With the growth of the application (traffic and users), not only will you create a load on the central database server (which might be serving other applications),but you will also be abusing your network bandwidth to transfer data between the far away database and the application server.
A better model is to keep your data close to the application server, and use the far away database for backup and recovery purposes only. You can use an FC\IP SAN (or any other storage network architecture) as your storage network model, based on your applications' needs.
Upvotes: 2
Reputation: 25484
Look for database replication. SQL Server can do this for sure, others (Oracle, MySQL, ...) should have it, too.
The idea is that the other location maintains a (subset) copy. Updates are exchanged incrementally. The way to treat conflicts depends on your application.
Upvotes: 4