thichxai
thichxai

Reputation: 1147

marklogic - use distinct-values

I try to get distinct values from documents. but it always return duplicate values.

for $uri in cts:uris((),
                     (),
                     cts:and-query(
                       cts:collection-query("/citation/company")
                 )
         )[1 to 1000]
return distinct-values( doc($uri)/PerformingOrganizations/Name)

result: 
   EARTH RESEARCH LLC
   EARTH RESEARCH LLC
   EARTH RESEARCH LLC
   EARTH RESEARCH LLC
   EARTH RESEARCH LLC

Why distinct-values returns duplicate values? How can I get result eliminate duplicate values. Thanks in advance.

Upvotes: 1

Views: 884

Answers (2)

Antony
Antony

Reputation: 976

Hope this XQuery will helpful for you. Try this one.

let $uris := cts:uri-match(('*.xml'),(),cts:collection-query("/citation/company"))
let $name := for $uri in $uris
             return doc($uri)/PerformingOrganizations/Name/text()

return fn:distinct-values($name)    

Upvotes: 0

wst
wst

Reputation: 11771

You are calling distinct-values within the scope of an iterator, so it is called once for every $uri. First collect your sequence of values, then call distinct-values once for all of them.

let $values :=
  for $uri in cts:uris((), 'limit=1000',
    cts:and-query(cts:collection-query("/citation/company")))
  return doc($uri)/PerformingOrganizations/Name 
return distinct-values($values)

Also, cts:uris is a lexicon function that will return every result in the index unless specifically limited by an options parameter. Limiting using a predicate will result in the lexicon call returning all of its results first, then the sequence will be limited to the first 1000. Using options to limit instead will result in the function only ever returning the first 1000 results. For indexes with many values, not using these options can cause performance problems.

Upvotes: 3

Related Questions