Vish
Vish

Reputation: 4492

Joins vs multiple copies of the data: Performance

I was just reading http://s.niallkennedy.com/blog/uploads/flickr_php.pdf about Flickr's infrastructure and this is what it said.

JOIN’s are slow
• Normalised data is for sissies
• Keep multiple copies of data around
• Makes searching faster

Is it true or its just their way of managing their DBs? If I just looking for performance that is it better to not normalise?

Upvotes: 1

Views: 109

Answers (1)

Brent Baisley
Brent Baisley

Reputation: 12721

Joins become a performance issue on large data sets. It's not something to worry about if you are not experiencing slowness issues. There are big advantages to normalized data, but nobody ever goes to fifth normal form. Typical is second or third normal form.

When you have performance issues, then you should consider de-normalizing what you have and making copies of data optimized for retrieval. Especially data that doesn't change.

Flickr probably has few updates, so there is minimal overhead in keeping multiple copies of data. They also have the luxury of eventual consistency, data doesn't have to replicate in real time.

Upvotes: 3

Related Questions