Reputation: 161
so, using solr 4.0
I have a fairly straight-up setup of an entity, with 1 sub entity (1:N relation)
the data to import sits on a mysql server
the main table has about 30 million records the sub table has about 5 million records(most parent entities don't have the sub entity, the rest generally have a single 1)
I am running into rather horrible indexing(importing) performance. about 80 entities(docs) per second. so to index this table it'll in theory take few days.
now from what I am seeing that solr reports is, for example, if I tell it to index the first 1000 entities it actually issues 1000+ queries to sql. I have also tried setting the batchSize property for the data source with no luck... only -1 works(otherwise out of memory exception).
really not sure what I can do to optimize this, is there no PROPER data importer for mysql?
Upvotes: 0
Views: 158
Reputation: 161
Thought the cachedEntity approach helped me in another issue, I have found that using nested entities is usually not just the went to go.
The logic to fire the sub entity query for each "root" entity is just never going to work.
I've re-written my statements to SQL JOIN which fetches both root and sub entities as a single row and mapped to fields accordingly and performance improved significantly.
Upvotes: 0
Reputation: 15791
you could use CachedSqlEntityProcessor so that the sub entity query at least is cached...
Upvotes: 1