Jeffrey Cameron
Jeffrey Cameron

Reputation: 10285

NHibernate and string primary keys

We have a legacy database that uses strings as primary keys. I want to implement objects on top of that legacy database to better implement some business logic and provide more functionality to the user.

I have read in places that using strings for primary keys on tables is bad. I'm wondering why this is? Is it because of the case-sensitivity issues? character sets?

... why is this particularly bad for NHibernate?

... and following up on that ... if strings do make bad primary keys, is it worth it to replace the primary keys in the database with ints or GUIDs or the like? (we only have about 25-30 tables involved)

Upvotes: 2

Views: 1868

Answers (3)

Daniel Dinnyes
Daniel Dinnyes

Reputation: 5027

Using strings or chars adds a huge amount of accidental complexity to your system. Consider these questions:

  • how to handle case sensitivity;
  • how to handle padding. NHibernate lets you insert a shorter string, and the database will silently add padding to it, but it won't be reflected in your persisted entity. Trying to fetch the entity again with the in-memory ID returns null;
  • how to handle encoding issues. C# uses unicode strings, your database migth not. Can you tell how the conversion will be handled? I don't think so.
  • synthetic integer keys can be autogenerated by most databases without extra effort. With strings you most probably create them "by hand". Unless you hide them behind a Factory (in the DDD sense), the resulting code will clutter your domain model.

Though the performance overhead mentioned by andy K can diminish because of indexing, still many times you do ID comparisions in-memory (hash-maps?) and the DB optimizations do not apply there.

I have been working on a project with a legacy database having string primary keys and no foreign keys at all. We are not allowed to thouch the old schema because a legacy app depends on every minor aspects of it. I feel that the string primary keys hurt the consistency more than the missing foreign keys, since NHibernate handles the later quite gracefully.

Upvotes: 0

Andriy Volkov
Andriy Volkov

Reputation: 18923

Andy K seems to imply that strings are not stored as bytes. That would be funny! In fact it all depends on how long the string PK is and what collation you use. It might be even faster than bigint or int identity and will almost definitely be faster than Guids. If these strings are something you'd have to search by anyway, then you would need an index (perhaps even clustered index) on them anyway, so why not make them PKs!

Upvotes: 1

andy K
andy K

Reputation: 86

Okay, I will have a stab at this. I will give a couple of quick caveats - I am not an expert on databases and my experience is with Hibernate (Java) rather than NHibernate, but here goes.

I think the issue of primary keys as strings is to do with the SQL data-type that is used to represent them in the database. Because the primary key is used all the time when inserting, querying and so on, the database engine has to spend lots of time comparing primary keys. If you are using numbers, these are simply stored as bytes which computers are really good at doing stuff with quickly. As soon as you start using strings, the cost of these operations (comparisons mainly) goes up significantly. Even if the database engine is using really neat strategies to compare keys, it will still always be faster to compare bytes as bytes rather than strings.

On modern hardware though, this is becoming much less an issue than it used to be, and with indexes the problem almost disappears.

I don't know for sure about why this is really bad in Hibernate (and NHibernate) but in my experience, because my application has a complex graph of objects that often have references to other persisted objects, often as lists or sets, the references are all stored using the ID of the other object, and because of the rules I have in place for cascading saves, fetching and so on, this will mean that the primary keys are being used ALL the time. Hibernate - which I quite like - tends to do exactly what its told to, and sometimes people (especially me!) tell it to do really dumb things. As a result, even seemingly simple updates or queries end up generating quite complex SQL.

So - in summary - strings as primary keys are bad due to cost of simple operations on them and using Hibernate may magnify this. In practice though, modern database engines have lots of neat strategies to ensure that the performance hit is not that bad. (Postgres - and presumably others - by default create indexes for primary keys)

For your follow up - should you replace your keys? Well, that depends on the performance of your application. If performance is critical, then for a high volume and very intensive application it may be a good idea, otherwise there will probably be minimal benefit, with the downside of having to spend time changing all your tables. You could expect to get much better results refining the strategies you are using with NHibernate (ie fetching strategies and when you are cascading saves and so on).

Upvotes: 5

Related Questions