Xi 张熹
Xi 张熹

Reputation: 11071

Can MySQL join XML field (or JSON)?

Please consider the following example. Let's say we are building a DB for an online book store. We have a Book table contains n records and a Type table contains m records. n is a very large number. m is small.

--------
Book
---------
BookId
BookName
BookType
---------

-------
Type
--------
TypeId
TypeName
---------

The traditional way to join these 2 table would be creating a 3rd table called BookType

---------- 
BookType
---------- 
BookTypeId 
BookId 
TypeId
----------

If we want to retrieve a book record with type, we can do:

select B.*, T.Name from Book B
inner join BookType BT on B.BookId = BT.BookId
inner join Type T on BT.TypeId = T.TypeId

Since the Book table is very large, so the BookType table is even larger. Since DB index is using B-Tree-like algorithms, the time cost will be: 2log(n) + Cm. Right? (With Book table and BookType table indexed)

However, if we can store the TypeId as a JSON array and use it for join, then we can get the data in one trip. The time will be log(n) + Cm which is at least twice as fast. The syntax can be something like:

select B.*, T.Name from Book B
inner join Type T on ParseJsonAsIntArray(BookType) = T.TypeId

I couldn't find a MySQL function like ParseJsonAsIntArray(). Why aren't they doing this? Apology if I am missing the obvious.

Upvotes: 1

Views: 696

Answers (1)

Bill Karwin
Bill Karwin

Reputation: 562438

No, there's no builtin function for parsing JSON in MySQL. The closest thing is ExtractValue() for XML data. This function uses an Xpath expression to pick out an element of an XML document. But regardless, there is no support in MySQL for indexing the elements within a semi-structured blob like XML or JSON. It's bound to be an inefficient query.

But first things first. You're trying to use denormalization to solve a problem which is actually an advantage of relational databases. The BookType table will be long, but the rows are individually pretty small. So it won't be as bad as you think.

It's a huge advantage for BookType to support indexed searches either by Book or by Type. When you denormalize, you basically make one of those lookups efficient, but you sacrifice the other lookup.

Also see my answer to Is storing a comma separated list in a database column really that bad?

Upvotes: 2

Related Questions