Reputation: 21
I am fetching records from a bucket via the Map function and the results contain different types of records (Entity1, Entity2, Entiy3 and so on). Ex -
function (doc, meta) {
if(meta.id.split('::')[0] == 'Entity1' && doc.status == 'NEW-DOC' )
{
emit('Entity1', {'id' :meta.id, 'status' : doc.status, 'type': doc.type} );
}
if(meta.id.split('::')[0] == 'Entity2' && doc.status == 'OLD-DOC' )
{
emit('Entity2', {'id' :meta.id, 'status' : doc.status, 'type': doc.type} );
}
if(meta.id.split('::')[0] == 'Entity3' && doc.status == 'DOC1' )
{
emit('Entity3', {'id' :meta.id, 'status' : doc.status, 'type': doc.type} );
}
}
The meta ID is like Entiy1::10101, Entity1::10102, Entity2::10101 and so on, so I have split the id and taken zeroth index as the key as it represents the entity; There could be n number of entities in the same bucket.
Reduce part- I am getting the emitted results in the values array as an array of arrays. When using the count_ function, I am getting the count of records in the values array to be 1000, but when using values' length I am getting the length as 10 (It should have been 1000). I know this is happening because CB re-reduces the array to multiple subarrays, but I need the entire array to iterate upon. When trying to iterate over values' length, my loop is ending at i=10 as values.length is 10. How could I iterate over the entire array at once?
Upvotes: 2
Views: 172
Reputation: 16177
I believe the issue here is a lack of understanding on how map-reduce is designed to work.
The first think to keep in mind is that Couchbase is a parallel data store. If you've ever tried iterating over parallel data structures in your own code, you know that you have to create a copy of the array so that you can have a data structure that doesn't shift on you while you're iterating. If you don't do this, many frameworks will throw exceptions to prevent inconsistent program execution.
Couchbase is not an array. To be able to iterate on a parallel data structure, you have to be able to define the bounds of the structure in the particular instant when you do the snapshot. Couchbase has no such notion - it's design is for the parallel data structure to reach eventual consistency - meaning there is no finite bounds of the data defined at any point in time. Thus, there is really no notion of an array at all - only an (infinite) series of single objects.
Map-reduce is designed for parallel execution. In line with the previous, Couchbase map-reduce functions must be able to be performed on any arbitrary subset of the data, possibly multiple times, with the goal to reach an eventually-consistent state. This page provides some detail on how that needs to work. Basically, your reduce
function must be designed such that it can handle incremental updates - let's say the first iteration processes 100 rows. Then, at some arbitrary point later, 20 new rows are processed. It's going to pass in the previous reduction of the 100 rows, plus the new 20, and expect you to produce a valid output.
Upvotes: 1