nogridbag
nogridbag

Reputation: 3703

Setting up database sharding - no cross-database queries

I'm developing a web application in which, for each client (a small-large organization), they do not (and should not) be able to query into other client's records.

Keeping the data in a single database would be easy and make updating and maintenance simple (until I run into scalability issues). But I want to future-proof the application now. If each client's data was contained in isolated databases, each client would have better performance and should scale better. I'm not sure if it's equivalent to database "sharding" since I'm not dividing up a single schema across multiple databases. I essentially would have a single schema duplicated in every database (just like back in the day when you would ship software on a CD - each with its own database).

I've read a bit into this, so I have an idea of the general concept. But, I have lots of questions in my head. I'm not exactly sure how transparent this process can be. Or if I will run into a maintenance nightmare updating hundreds of schemas whenever I want to roll out a change.

Really, I'm just looking for a simple "complete" example (hopefully using spring/java).

  1. I would imagine I can have a single application server to start with a single datasource, let's say a mysql instance with a single table mapping a userid to a database.

    • userid
    • database/shard-id

    For every request (query), ignoring database caching, would it have to look up the shard id for the user? Or is this something it can do one time initially per session and talk directly to the target database? (As you might be able to tell I'm not strong with server side stuff).

  2. Can someone give a high level overview of how this might possibly be wired up in Spring? Currently, my architecture is very simple. I have simple Spring component DAO's using jdbctemplate. The DAO's datasource is injected (datasource is configured on applicationContext.xml). The DAO's are autowired into my service classes. Pretty standard stuff.

  3. Let's say I get the previous step working and now I have to change the schema. Are there management tools out there I can use to apply the schema change once and have it propagate out to 100 other databases?

I'm using MySQL. I believe "MySQL Proxy" might be able to solve problems 1 and 2. Does anyone have any experience with this? I suppose it cannot handle managing schema updates so I may have to roll my own solution.

Thanks!

Upvotes: 1

Views: 1958

Answers (5)

allen
allen

Reputation: 1654

You can use DDAL to implemente accessing different databases in DAL, and it doesn't rely on spring's datasource and transaction management. And there is a demo project to show how to use it: https://github.com/hellojavaer/ddal-demos . you can have a try.

Upvotes: 0

Will Hartung
Will Hartung

Reputation: 118593

I can't talk to Spring, as I don't use it.

If it were me, with my Java EE hat on, I'd simply use a JNDI Datasource, create one for each client, and look it up via the client name or whatever identifier you're using to distinguish your clients.

Now, I'm sure that can be done with Spring, I just can't tell you how.

How well the common Database Connection Pool implementations handle "100 databases" with however many connections, is a different question as well (visions of several hundred open connection sockets come to mind). I can't talk to that either, not having done it.

But after that, since each pool points to a separate database, then you're basically all done. Each pool can have it's own configuration so you can move the DBs to different hosts, etc.

That would be my first cut at the problem until it failed under test, but I'm guessing that the failure point will be the DB pool implementation or related to that. Everything else is generic DB server stuff and Java.

Upvotes: 0

Kalpesh
Kalpesh

Reputation: 11

I am using spring and sharding at my company the idea is that

  1. You would implement a ShardDataSourceManager that would be basically pool of connection pools and you would lookup a data-source by shard id.
  2. You would define your own Transactional annotations and annotate methods with it
  3. You need to write an interceptor at dao layer that would read annotations on method and some context info. From the context info you would lookup shard id and lookup data-source and inject into a thread local.
  4. The dao layer when it looks up data-source would look into thread local to construct a jdbc template and execute queries on it.

Upvotes: 1

Subhash
Subhash

Reputation: 326

I dont know spring much so cant talk much about it. But for database sharding, I suggest you have a look at this post on High Scalability

New Relic Architecture - Collecting 20+ Billion Metrics a Day

It talks about nice sharding strategy and how it helps when load varies. Also look at comments sections where he gives more details of sharding.

Upvotes: 0

Liran Zelkha
Liran Zelkha

Reputation: 129

it's kind of a sharding/multi-tenancy situation. You will have a maintenance nightmare, and it will require writing allot of code. There are third parties you can use - you can try ScaleBase (disclosure: I work there) they do exactly what you describe, in a way transparent to your application.

Upvotes: -1

Related Questions