Tutan Ramen
Tutan Ramen

Reputation: 1242

Efficient XQuery query to determine the documents where an element does NOT exist

Let's say I have ~50 million records in a collection like this:

<record>
  <some_data>
    <some_data_id>112423425345235</some_data_id>
  </some_data>
</record>

So I have maybe a million records (bad data) that look like this:

<record>
  <some_data>
  </some_data>
</record>

With some_data element being empty. So if I have an element-range-index setup on some_data_id, what's an efficient XQuery query that will give me all the empty ones to delete?

I think what I'm looking for is a query that is not a FLWOR where you check the existence of children records for each element, as I think that is inefficient (i.e. pulling the data back and then filtering)?

Whereas if I did it in the cts:search query then it would be more efficient, as in filter the data before pulling it back?

Please write a query that can do this efficiently and confirm whether or not my assumptions about FLWOR statements are correct.

Upvotes: 2

Views: 467

Answers (1)

wst
wst

Reputation: 11773

I don't think you need a range index to do this efficiently. Using the "universal" element indexes via cts:query constructors should be fine:

cts:element-query(xs:QName('record'),
  cts:element-query(xs:QName('some_data'),
    cts:not-query(cts:element-query(xs:QName('some_data_id'), cts:and-query(())))
  )
)

Upvotes: 5

Related Questions