Reputation: 2852
Friends!
I am using MongoDB in java project via spring-data. I use Repository interfaces to access data in collections. For some processing I need to iterate over all elements of collection. I can use fetchAll method of repository, but it always return ArrayList.
However, it is supposed that one of collections would be large - up to 1 million records several kilobytes each at least. I suppose I should not use fetchAll in such cases, but I could not find neither convenient methods returning some iterator (which may allow collection to be fetched partially), nor convenient methods with callbacks.
I've seen only support for retrieving such collections in pages. I wonder whether it is the only way for working with such collections?
Upvotes: 37
Views: 47509
Reputation: 1038
you can still use mongoTemplate to access the Collection and simply use DBCursor:
DBCollection collection = mongoTemplate.getCollection("boundary");
DBCursor cursor = collection.find();
while(cursor.hasNext()){
DBObject obj = cursor.next();
Object object = obj.get("polygons");
*Updated in 2023. getObjectMapper() is Jackson Object Mapper. you must register the Spring Data GeoJsonModule as a serizlier/deserlizer.
MongoCollection<org.bson.Document> collection = mongoTemplate.getCollection("postals");
FindIterable<Document> cursor = collection.find();
cursor.forEach(obj -> {
Document geoJsonPolygonDocument = (Document)obj.get("polygon");
if(geoJsonPolygonDocument != null) {
GeoJsonPolygon geoJsonPolygon = getObjectMapper().readValue(geoJsonPolygonDocument.toJson(),GeoJsonPolygon.class);
}
});
Upvotes: 13
Reputation: 11
Du to Mongo DB Cursor capabilities, if you have long process, you can lost cursor...
I recommend to use paging :
final int pageSize = 1000;
var paging = Pageable.ofSize(pageSize);
do {
Page<T> page = repository.findAll(paging); // Retrieve page items
page.forEach((item) -> this.processItem(item);); // Do item job
// page++
paging = page.nextPageable(); // If last: return Pageable.unpaged()
}
while (paging.isPaged()); // If last: Unpaged.isPaged() return false
And for repository, 2 options:
// Use Spring Data Interface
@Repository
public interface YourDao extends PagingAndSortingRepository<T, ID> {
// extends create this impl
// Page<T> findAll(Pageable pageable);
}
// Or create your own Impl
public class YourDaoImpl implements YourDao {
@Override
public Page<T> findAll(Pageable pageable) {
final var query = new Query().with(pageable);
var items = mongoTemplate.find(query, T.class);
return PageableExecutionUtils.getPage(
items,
pageable,
() -> mongoTemplate.count(Query.of(query).limit(-1).skip(-1), T.class));
}
}
Upvotes: 0
Reputation: 369
Since this question got bumped recently, this answer needs some more love!
If you use Spring Data Repository interfaces, you can declare a custom method that returns a Stream, and it will be implemented by Spring Data using cursors:
import java.util.Stream;
public interface AlarmRepository extends CrudRepository<Alarm, String> {
Stream<Alarm> findAllBy();
}
So for the large amount of data you can stream them and process the line by line without memory limitation.
See https://docs.spring.io/spring-data/mongodb/docs/current/reference/html/#mongodb.repositories.queries
Upvotes: 23
Reputation: 446
This answer is based on: https://stackoverflow.com/a/22711715/5622596
That answer needs a bit of an update as PageRequest
has changed how it is being constructed.
With that said here is my modified response:
int pageNumber = 1;
//Change value to whatever size you want the page to have
int pageLimit = 100;
Page<SomeClass> page;
List<SomeClass> compondList= new LinkedList<>();
do{
PageRequest pageRequest = PageRequest.of(pageNumber, pageLimit);
page = repository.findAll(pageRequest);
List<SomeClass> listFromPage = page.getContent();
//Do something with this list example below
compondList.addAll(listFromPage);
pageNumber++;
}while (!page.isLast());
//Do something with the compondList: example below
return compondList;
Upvotes: 0
Reputation: 1
The best way to iterator over a large collection is to use the Mongo API directly. I used the below code and it worked like a charm for my use-case.
I had to iterate over more than 15M records and the document size was huge for some of those.
The following code is in Kotlin Spring Boot App (Spring Boot Version: 2.4.5)
fun getAbcCursor(batchSize: Int, from: Long?, to: Long?): MongoCursor<Document> {
val collection = xyzMongoTemplate.getCollection("abc")
val query = Document("field1", "value1")
if (from != null) {
val fromDate = Date(from)
val toDate = if (to != null) { Date(to) } else { Date() }
query.append(
"createTime",
Document(
"\$gte", fromDate
).append(
"\$lte", toDate
)
)
}
return collection.find(query).batchSize(batchSize).iterator()
}
Then, from a service layer method, you can just keep calling MongoCursor.next() on returned cursor till MongoCursor.hasNext() returns true.
An Important Observation: Please do not miss adding batchSize on 'FindIterable' (the return type of MongoCollection.find()). If you won't provide the batch size, the cursor will fetch initial 101 records and will hang after that (it tries to fetch all the remaining records at once).
For my scenario, I used the batch size as 2000, as it gave the best results during testing. This optimized batch size will be impacted by the average size of your records.
Here is the equivalent code in Java (removing createTime from query as it is specific to my data model).
MongoCursor<Document> getAbcCursor(Int batchSize) {
MongoCollection<Document> collection = xyzMongoTemplate.getCollection("your_collection_name");
Document query = new Document("field1", "value1");// query --> {"field1": "value1"}
return collection.find(query).batchSize(batchSize).iterator();
}
Upvotes: 0
Reputation: 7988
Late response, but maybe will help someone in the future. Spring data doesn't provide any API to wrap Mongo DB Cursor capabilities. It uses it within find
methods, but always returns completed list of objects. Options are to use Mongo API directly or to use Spring Data Paging API, something like that:
final int pageLimit = 300;
int pageNumber = 0;
Page<T> page = repository.findAll(new PageRequest(pageNumber, pageLimit));
while (page.hasNextPage()) {
processPageContent(page.getContent());
page = repository.findAll(new PageRequest(++pageNumber, pageLimit));
}
// process last page
processPageContent(page.getContent());
UPD (!) This method is not sufficient for large sets of data (see @Shawn Bush comments) Please use Mongo API directly for such cases.
Upvotes: 29
Reputation: 1308
You may want to try the DBCursor way like this:
DBObject query = new BasicDBObject(); //setup the query criteria
query.put("method", method);
query.put("ctime", (new BasicDBObject("$gte", bTime)).append("$lt", eTime));
logger.debug("query: {}", query);
DBObject fields = new BasicDBObject(); //only get the needed fields.
fields.put("_id", 0);
fields.put("uId", 1);
fields.put("ctime", 1);
DBCursor dbCursor = mongoTemplate.getCollection("collectionName").find(query, fields);
while (dbCursor.hasNext()){
DBObject object = dbCursor.next();
logger.debug("object: {}", object);
//do something.
}
Upvotes: 2
Reputation: 1143
Use MongoTemplate::stream() as probably the most appropriate Java wrapper to DBCursor
Upvotes: 12
Reputation: 1607
Another way:
do{
page = repository.findAll(new PageRequest(pageNumber, pageLimit));
pageNumber++;
}while (!page.isLastPage());
Upvotes: 4