Reputation: 123
I need to generate a list of keywords for each document in a set of documents that are loaded into MarkLogic. I am considering running cts:distinctive-terms against the set of documents, but cannot figure out how to get a list of keywords for each document rather than a list of terms relevant to the set. Can anyone suggest a solution?
Upvotes: 2
Views: 298
Reputation: 7842
Were you using the score=logtf
option? When I tried that, the scores of stop-words went up quite a bit. If you think about it this makes sense: the database can no longer use IDF to weed them out. If you only want TF, though, you could filter using a stop-word list - as already suggested.
But logtfidf
scoring should naturally penalize stop-words. You can set the min-val
option or other options to tune the results. For example, here I set min-val
to 27 because stop-words began to appear at 26. The right options will depend on the existing database content, because of IDF.
cts:distinctive-terms(
text { 'I need to generate a list of keywords for each document in a set of documents that are loaded into MarkLogic. I am considering running cts:distinctive-terms against the set of documents, but cannot figure out how to get a list of keywords for each document rather than a list of terms relevant to the set. Can anyone suggest a solution?' },
<options xmlns="cts:distinctive-terms"
xmlns:db="http://marklogic.com/xdmp/database">
<min-val>27</min-val>
<use-db-config>false</use-db-config>
<db:stemmed-searches>true</db:stemmed-searches>
<db:word-searches>false</db:word-searches>
<db:fast-phrase-searches>false</db:fast-phrase-searches>
</options>)/cts:term/cts:word-query/cts:text/string()
=>
load
set
solution
term
document
list
keyword
Upvotes: 3
Reputation: 20414
Simply iterate over the docs of interest and call cts:distinct-terms for each doc separately:
for $doc in doc()
return
cts:distinctive-terms($doc)
HTH!
Upvotes: 3