Reputation: 14269
I am currently considering an implementation of a service, that needs to store a lot of variable user data. That is: in one query a user might search for a,b, c, in the next a search goes over q, x, a and then someone comes up with additional data introducing z and y, which then again should be used for certain lookups.
If I would implement that as a standard SQL table, I had columns like a, b, c, ..., z, where most of those columns are null, and every few weeks someone would add another column for a new special case.
My solution for this issue is to describe those objects not through columns but as XML objects, which then only include those pieces of data that are relevant for the given object:
<myObject>
<a>ValueA</a>
<d>ValueD</d>
...
<q>ValueQ</q>
</myObject>
While this solves the issue from a data perspective, reducing clutter and making the data much more readable, I wonder how the performance of MySQL compares to the usage of columns.
I cannot find any information about an XML index that allows me to search for all objects, where i.E. <a>value</a>
. As look-up performance will be important for this service I need to know about what magnitude of performance hit implementing my data values as XML I have to face. Sadly I could not find any reliable performance analysis either.
What is the performance difference of using columns vs. saving data as XML in MySQL?
Upvotes: 1
Views: 542
Reputation: 10153
Data is not automatically indexed, if you query on ExtractValue
results the comparison is done in-memory and no index is used. You need to manually define these as "STORED" columns and this only works on 5.7+.
ALTER TABLE foo ADD COLUMN some_node_of_my_xml TEXT AS (ExtractValue(doc, "/some/path/to/my/node/@name")) STORED;
Then you can query over some_node_of_my_xml
like you'd do on any other column.
But in essence it's exactly the same as extracting the data and storing it at an application level instead of DB level.
What I'd suggest instead is to store these "dynamic" fields in a separate table, name/value pairs, and JOIN. Or implement ElasticSearch, or switch to a "traditional" document database like Mongo.
Upvotes: 1