noctonura
noctonura

Reputation: 13121

Should I use a string table to make database more efficient?

Let's say you have a database with a single table like...

---------------------------------------------
| Name    |  FavoriteFood                   |
---------------------------------------------
| Alice   | Pizza                           |
| Mark    | Sushi                           |
| Jack    | Pizza                           |
---------------------------------------------

Would it be more space-efficient to have an additional table called "Strings" that stores strings, and change the FavoriteFood column to an index in the string table. In the above example, "Pizza" looks like it is stored twice, but with the additional table, it would appear to be stored only once. Of course, please assume there are 1,000,000 rows and 1,000 unique strings instead of just 3 rows and 2 unique strings.

Edit: We don't know what the FavoriteFoods are beforehand: they are user-supplied. The programmatic interface to the string table would be something like...

String GetString(int ID) { return String at with Row-ID == ID }

int GetID(String s) {
  if s exists, return row-id;
  else {
    Create new row;
    return new row id;
  }
}

So the string-table seems more efficient, but do modern databases already do that in the background, so I can just do the simple one table approach and be efficient?

Upvotes: 4

Views: 2194

Answers (4)

Alex Yakunin
Alex Yakunin

Reputation: 6678

Pros for having a separate "Strings" table:

  • Likely, less space, if strings repeat really frequently
  • Likely, faster typical queries - because of less I\O

Cons:

  • You'll write more complex queries to achieve the same result
  • If the repetition factor is rather small, you'll get higher query execution time. To resolve each ID to string (or back), database server will perform a single lookup (seek operation) per each ID. So you get additional log(Strings.Count()) factor ~ for each query doing this.

But actually this is really effecient. E.g. most of full-text search engines use nearly this approach to store document-word maps.

Upvotes: 1

Mitch Wheat
Mitch Wheat

Reputation: 300559

You should be thinking in terms of what makes a good design in terms of your problem domain rather than efficiency (unless you expect to have tens of millions+ rows).

A well designed database should be in 3NF (third normal form). Only denormalise when you have identified a performance problem by measuring.

Upvotes: 4

Arvind
Arvind

Reputation: 717

In case you have another table to store the strings, it will be easier when you want to update the descriptions, for example, if u need to update all Pizzas to Italian Pizza, then u can do with one row update if u use a separate table. Another advantage would be translations, u can use the other table to store translations of the string in different languages and select the one based on the current language.

But the problem with that approach would be for inserts. U need to insert in both tables and also need to maintain the foreign key constraints, so it adds a bit of complexity to a simple table.

Upvotes: 2

Tyler McHenry
Tyler McHenry

Reputation: 76670

What are you measuring efficiency by? Assuming there is no other data associated with each FavoriteFood (in which case obviously you want two tables), a one-table approach is probably more time efficient, as the unnecessary join would incur an extra processing cost. On the other hand, a two-table approach may be more space-efficient, since it takes less space to store an index than a string, but that depends on how the particular database that you're using optimizes storage of repeated strings.

Upvotes: 5

Related Questions