High availabity and Database Design

Question

This is one of the questions in my mind for a long time. How do Facebook or any such website/app which has more than a hundred million users maintain there database?

I believe that everything cannot be put into a single database. If this is the case should there be multiple databases handling different sections? Different sections like: one database for status, one for photographs and one for users…

Can the database schema be made relational?

500 million+ users and growing, if average one user has 10 text updates, 5 billion rows (at least), which should be 10% of data that Facebook actually handles.

I read somewhere that Facebook has 1800+ sql instances out of which 800+ are memcached. Should these DB instances be identical? How might these be designed?

tyronegcarter · Accepted Answer

Facebook and other large companies that have huge databases employ database partitioning.

Partitioning is the distribution of a table over multiple subtables that may reside on different databases or servers in order to improve read/write performance. SQL Server partitioning is typically done at the table level, and a database is considered partitioned when groups of related tables have been distributed. Tables are normally partitioned horizontally or vertically.

Horizontal Partitioning (also known as sharding) improves overall read/write performance

Horizontal partitioning involves putting different rows into different tables. Perhaps customers with ZIP codes less than 50000 are stored in CustomersEast, while customers with ZIP codes greater than or equal to 50000 are stored in CustomersWest. The two partition tables are then CustomersEast and CustomersWest, while a view with a union might be created over both of them to provide a complete view of all customers.

Horizontal partitioning is a database design principle whereby rows of a database table are held separately, rather than splitting by columns (as for normalization). Each partition forms part of a shard, which may in turn be located on a separate database server or physical location.

There are numerous advantages to this partitioning approach. The total number of rows in each table is reduced. This reduces index size, which generally improves search performance. A database shard can be placed on separate hardware, and multiple shards can be placed on multiple machines. This enables a distribution of the database over a large number of machines, which means that the database performance can be spread out over multiple machines, greatly improving performance. In addition, if the database shard is based on some real-world segmentation of the data (e.g. European customers vs. American customers) then it may be possible to infer the appropriate shard membership easily and automatically, and query only the relevant shard.

Sharding is in practice far more difficult than this. Although it has been done for a long time by hand-coding (especially where rows have an obvious grouping, as per the example above), this is often inflexible. There is a desire to support sharding automatically, both in terms of adding code support for it, and for identifying candidates to be sharded separately.

Where distributed computing is used to separate load between multiple servers (either for performance or reliability reasons) a shard approach may also be useful.

Shards compared to horizontal partitioning

Horizontal partitioning splits one or more tables by row, usually within a single instance of a schema and a database server. It may offer an advantage by reducing index size (and thus search effort) provided that there is some obvious, robust, implicit way to identify in which table a particular row will be found, without first needing to search the index, e.g. the classic example of the 'CustomersEast' and 'CustomersWest' tables, where their zip code already indicates where they will be found.

Sharding goes beyond this: it partitions the problematic table(s) in the same way, but it does this across potentially multiple instances of the schema. The obvious advantage would be that search load for the large partitioned table can now be split across multiple servers (logical or physical), not just multiple indexes on the same logical server.

Splitting shards across multiple isolated instances requires more than simple horizontal partitioning. The hoped-for gains in efficiency would be lost, if querying the database required both instances to be queried, just to retrieve a simple dimension table. Beyond partitioning, sharding thus splits large partitionable tables across the servers, whilst smaller tables are replicated into them en masse.

This is also why sharding is related to a shared nothing architecture - once sharded, each shard can live in a totally separate logical schema instance / physical database server / data center / continent. There is no ongoing need to retain shared access (from between shards) to the other unpartitioned tables in other shards.

This makes replication across multiple servers easy (simple horizontal partitioning can't). It is also useful for worldwide distribution of applications, where communications links between data centers would otherwise be a bottleneck.

Obviously there is also a need for some notification and replication mechanism between schema instances, so that the unpartitioned tables remain as closely synchronized as the application demands. This is a complex choice in the architecture of sharded systems: approaches range from making these effectively read-only (updates are rare and batched), to dynamically replicated tables (at the cost of reducing some of the distribution benefits of sharding) and many options in between.
Vertical Partitioning improves access to data

In a vertically partitioned table, columns are removed from the main table and placed in child tables through a process called denormalization. This type of partitioning allows you to fit more rows on a database page, making tables narrower to improve data-access performance. Therefore, a single I/O operation will return more rows. By vertically partitioning your data, you may have to resort to joins to return the denormalized columns.

In addition to partitioning, of course, there is replication, making multiple copies of the data available.

Effect on Relational Database Schemas

Sharding does destroy your relational database – which is a good thing. The idea behind sharding is to distribute data to several databases based on certain criterias. This could for example be the primary key. All entities that keys begin with 1 go to one database, with 2 to another and so on (often modulo functions on the key are used, or groups based on business data like customer location, or function). Several reasons exists for sharding, the main two being better performance and lower impact of crashed databases – only persons with a name that starts with S will be affected by a database crash.

Relational databases were the tool of choice for several decades when it comes to data storage. But they do more than store data. Even reading operations can be split into several functions. There are at least three kinds of database read queries:

Data graph building queries: With these you get your data out of the database, customers together with adresses etc.
Aggregation queries: How many orders have been stored in the August, aggregated by product category
Search queries: Give me all customers who live in New York

Sharding now does away with the second and third query and reduces databases to data storage. Because the shards are different databases on different systems you can’t aggregate queries (compared to a cluster) without custom code across systems and you cannot search with one query (only several ones – one to each database). Databases have lead to the notion that search and retrieval are linked together and should be dealt together. Most people think as retrieval and search as the same thing. This has blocked development on technologies. Sharding, S3, Dynamo, Memcached have changed this preception recently. Rickard from Qi4j fame said this:

Entities are really cool. We have decided to split the storage from the indexing/querying, sort of like how the internet works with websites vs Google, which makes it possible to implement really simple storages. Not having to deal with queries makes things a whole lot easier.

Thus, storage and search are two different things and any sizable web related company handles them differently.

People talked about splitting storage and search for some time now. Search engines like Lucene have driven searching out of databases. But mainly the notion of store & search is prevalent. Sharding as a mechanism for more perfomance and lower risk will move into many web companies and reduce databases to storage mechanism and drop the aggreation (data warehouse and reporting) and search parts. Those can be better filled with real data warehouse servers like Mondrian and search services based on Lucene or semantic enginse like Sesame. And storage might move from relational databases to simple storages like Amazon Simple DB or JDBM or NoSQL.

High availabity and Database Design

Answers (1)

Related Questions