Reputation: 21

map-reduce function in CouchDB

I have a java program, that reads all words of a PDF file. I saved the words with the pagenumbers in a database (couchDB). Now I want to write a map and a reduce function, which list each word with the pagenumbers where the word occurs, but if words occur more than once on a page, I want just one entry. The result should be a row with word and a second row with a list (String separated with comma) of pagenumbers. Each word with the pagenumber is a separate document in couchDB. How can I do this with a map-reduce function (filter same entries of pagenumbers)? Thanks for help.

Upvotes: 2

Answers (1)

Marek Kowalski

Reputation: 1792

Surely there is more than one way of doing it. I'd go for something simple. Lets say your documents look somewhat like this:

{ 'type': 'word-index', 'word': 'Great', 'page_number': 45 }

This is a result of finding the word 'Great' on page 45. Now your view index is created by a view function:

function map(doc) {
    if (doc.type == 'word-index') {
        emit([doc.word, doc.page_number], null);
    }
}

For reduce part just use the "_count" builtin.

Now to get the list of all the occurrences of word "Great" in your book, just query your view with startkey=["Great"] and endkey=["Great", {}]. Now the result would look somewhat like:

["Great", 45], 4
["Great", 70], 7

Which means that world "Great" appeared 4 times on page 45 and 7 times on page 70. You can extract your comma separated list you needed from it. The number of occurrences is a bonus.

--edit--

You also have to use group_level=2 in your query. If you don't the result of the query would simply be a single row with the count off all the documents you have.

Upvotes: 4

map-reduce function in CouchDB

Answers (1)

Related Questions