Reputation: 11
I have an application that is a mini-CRM. I am trying to add functionality to allow for bulk user imports. The upload handler reads the data from a CSV file and then calls my CustomerService class to store the Customer objects in the datastore:
public int createCustomers(final List<Customer> customers) {
List<List<Customer>> buckets = bucketList(customers);
int bucketCount = 0;
PersistenceManager persistenceManager = PMF.get().getPersistenceManager();
for(List<Customer> bucket: buckets) {
Collection<Customer> makePersistentAll = persistenceManager.makePersistentAll(bucket);
}
return customers.size();
}
The bucketList method just breaks a large list down into smaller lists. I did this in an attempt to tune the application and see if there was an optimal size for the makePersistentAll call. I currently have it set to 1000 and am testing with a CSV file that contains 100,000 records. The application seems to get increasingly slower as more records are added, specifically around the 60K record mark. I've tried setting all my fields in Customer to be unindexed, but that doesn't seem to make any noticable difference:
@PersistenceCapable
public class Customer implements Serializable {
@PrimaryKey
@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Key key;
@Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
@Persistent
private String accountNumber;
@Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
@Persistent
private String email;
@Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
@Persistent
private String firstName;
@Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
@Persistent
private String lastName;
...
I've tested this in development (local) as well as in the production App Engine but to no avail. I would think this is a somewhat common use case, importing a large amount of data into the system and saving it rapidly to the datastore. I've tried a number of things to get this to work: - Using the AsyncDatastoreService - Saving customer objects one by one (makePersistent) - Using a Key object in the Customer as the primary key - Using the accountNumber string as the primary key
but nothing seems to make much of a difference.
Upvotes: 1
Views: 605
Reputation: 15577
Suggest you look at http://www.datanucleus.org/products/accessplatform_3_2/jdo/performance_tuning.html in particular "Persistence Process" with regards to large numbers of objects. You could cut down the number of objects being pumped into "makePersistentAll()"
so you have several calls. Obviously there is likely some oddness of GAE/Datastore that may be causing this
Upvotes: 1