Reputation: 4378
I have an application that will generate a log entry for every search query it processes (from Solr) so that I can calculate certain statistics for my search engine. For instance the number of queries without results, the average number of hits, et cetera. Now I'm wondering how best to perform this analysis. Load is estimated at a high of ten thousands searches per day with statistics generated over a weekly period. In other words, I'm looking for the best way to calculate statistics on up to one hundred thousand XML files.
The process will be controlled by Apache Camel and I'm currently thinking that XQuery will be my best bet of tackling this problem. As I'm still working on establishing the schema I can't run any real world tests so I wanted to garner some opinions on the best approach to take before I dive in. Some questions:
Upvotes: 0
Views: 561
Reputation: 9789
Do they have to be in XML format? I would very strongly explore loading this statistics into database of some sort. Either normal database if the fields/categories of information are regular or into one of the schema-less NoSQL databases if they are not. This would makes deriving statistics much easier.
You can even load it back into a Solr (separate core) using either concrete schema or dynamic fields, if your logged criterias may change.
Upvotes: 0
Reputation: 163468
Either XSLT 2.0 or XQuery 1.0 can handle this in principle, but the performance depends on the actual volumes and on the complexity of the queries. Generally, (I know it sounds banal) XSLT is better at transformation (generating a new document from each source document) while XQuery is better at query (extracting a small amount of information from each source document). There's no particular point in merging all the small documents into one big document. I would say also that there's not much point in putting them in a database unless either (a) you really need the cross-indexing this will provide, or (b) you're going to use the documents repeatedly over a period of time.
Upvotes: 3
Reputation: 6218
Answers in the respective order of the questions:
fn:collection()
functionUpvotes: 2