generate-id() too slow for large document

Question

I have a large xml document containing annotated speech transcripts. Following is a short fragment.

The basic task I need to do is to get the number of nodes between certain pairs of nodes. I've used the following stylesheet fragment to do this (illustrating with one specific pair of nodes).

This works fine on such a short XML fragment as above and gives the correct result: Result: 6.

However, the actual XML document contains tens of thousands of nodes and even more nodes. So when I try to run the stylesheet on it the result comes back very slowly. (It would probably take days to finish completely.) I suppose the problem must be that on each run of the line, the processor (Saxon) is checking all nodes and generating id's for nodes multiples times (i.e., exponentially) and that slows everything down.



Is there a way to speed up the process while still using generate-id()? Or do I need to get the number of  nodes with some alternate approach?

John Bollinger · Accepted Answer

You do not need generate-id() just to avoid matching elements intervening between the start and end nodes. You are matching elements by their id attributes in the first place, and I see no reason not to use that more directly. For example,

You can simplify that by removing the [1] position predicate if you can rely on the element @ids to be unique in the document.

If generate-id() is indeed the primary cause of your performance problem, then avoiding it altogether ought to provide a big boost.

generate-id() too slow for large document

Answers (1)

Related Questions