Francesco
Francesco

Reputation: 2382

"size exceeds the configured maximum" error while indexing

I need to index PDF-Files and I was told Solr could do this. So I installed a Solr-Server on Weblogic and tried some things with the Web-Interface.

Finally I wrote a JUnit-Test class to try to do the same things with Java and Solrj.

I wrote a (simple) code for indexing a couple of pdfs and perform a query to see if the documents were indexed:

@Test
  public void documentSearchTest() throws IDSystemException
  {
    try
    {
      server.deleteByQuery("*:*");

      Assert.assertTrue("Document not found! - " + TEST_PDF_DOCUMENT1, new File(TEST_PDF_DOCUMENT1).exists());
      Assert.assertTrue("Document not found! - " + TEST_PDF_DOCUMENT2, new File(TEST_PDF_DOCUMENT2).exists());

      req.addFile(new File(TEST_PDF_DOCUMENT1), CONTENT_TYPE_APPLICATION_PDF);
      req.addFile(new File(TEST_PDF_DOCUMENT2), CONTENT_TYPE_APPLICATION_PDF);

      NamedList<Object> result = server.request(req);

      SolrQuery solrQuery = new SolrQuery().setQuery("*:*");

      QueryResponse rsp = server.query(solrQuery);

      SolrDocumentList docs = rsp.getResults();

    }
    catch (SolrServerException sse)
    {
      throw new IDSystemException(LOG, sse.getMessage(), sse);
    }
    catch (IOException ioe)
    {
      throw new IDSystemException(LOG, ioe.getMessage(), ioe);
    }
  } 

By running this test, I get the following error:

<11.02.2014 09:08 Uhr MEZ> <Notice> <Stdout> <BEA-000000> <785764 [[ACTIVE] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'] INFO  org.apache.solr.core.SolrCore  ? [Collection1] REMOVING ALL DOCUMENTS FROM INDEX> 
<11.02.2014 09:08 Uhr MEZ> <Notice> <Stdout> <BEA-000000> <785764 [[ACTIVE] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'] INFO  org.apache.solr.update.processor.LogUpdateProcessor  ? [Collection1] webapp=/solr path=/update params={wt=javabin&version=2} {deleteByQuery=*:*} 0 0> 
<11.02.2014 09:08 Uhr MEZ> <Notice> <Stdout> <BEA-000000> <786215 [[ACTIVE] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'] ERROR org.apache.solr.servlet.SolrDispatchFilter  ? null:org.apache.commons.fileupload.FileUploadBase$SizeLimitExceededException: the request was rejected because its size (2100088) exceeds the configured maximum (2097152)
    at org.apache.commons.fileupload.FileUploadBase$FileItemIteratorImpl$1.raiseError(FileUploadBase.java:902)
    at org.apache.commons.fileupload.util.LimitedInputStream.checkLimit(LimitedInputStream.java:71)
    at org.apache.commons.fileupload.util.LimitedInputStream.read(LimitedInputStream.java:128)
    at org.apache.commons.fileupload.MultipartStream$ItemInputStream.makeAvailable(MultipartStream.java:977)
    at org.apache.commons.fileupload.MultipartStream$ItemInputStream.read(MultipartStream.java:887)
    at java.io.InputStream.read(InputStream.java:85)
    at org.apache.commons.fileupload.util.Streams.copy(Streams.java:94)
    at org.apache.commons.fileupload.util.Streams.copy(Streams.java:64)
    at org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:362)
    at org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126)
    at org.apache.solr.servlet.SolrRequestParsers$MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:547)
    at org.apache.solr.servlet.SolrRequestParsers$StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:681)
    at org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:150)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:393)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
    at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:56)
    at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3592)
    at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
    at weblogic.security.service.SecurityManager.runAs(SecurityManager.java:121)
    at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2202)
    at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2108)
    at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1432)
    at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
    at weblogic.work.ExecuteThread.run(ExecuteThread.java:173)> 

I checked the weblogic settings (servers->protocols->http) and there the Max Post Size is set to -1 (should mean unlimited size).

Is there something else somewhere that must be also set?

EDIT: Here the solrconfig.xml

<?xml version="1.0" encoding="UTF-8" ?>
<config>
    <luceneMatchVersion>LUCENE_45</luceneMatchVersion>
    <directoryFactory name='DirectoryFactory' class='solr.MMapDirectoryFactory' />

    <codecFactory name="CodecFactory" class="solr.SchemaCodecFactory" />

    <lib dir='${solr.core.instanceDir}\lib' />
    <lib dir="${solr.core.instanceDir}\dist\" regex="solr-cell-\d.*\.jar" />
    <lib dir="${solr.core.instanceDir}\contrib\extraction\lib" regex=".*\.jar" />

    <requestHandler name="standard" class="solr.StandardRequestHandler" default="true" />

    <requestHandler name="/update" class="solr.UpdateRequestHandler">
        <lst name="defaults">
            <str name="update.chain">deduplication</str>
        </lst>
    </requestHandler>

    <requestHandler name="/update/extract"
        class="solr.extraction.ExtractingRequestHandler">
        <lst name="defaults">
            <str name="captureAttr">true</str>
            <str name="lowernames">true</str>
            <str name="overwrite">true</str>
            <str name="literalsOverride">true</str>
            <str name="fmap.a">link</str>
            <!-- the configuration here could be useful for tests -->
            <str name="update.chain">deduplication</str>
        </lst>
    </requestHandler>

    <updateRequestProcessorChain name="deduplication">
        <processor
            class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
            <bool name="overwriteDupes">false</bool>
            <str name="signatureField">uid</str>
            <bool name="enabled">true</bool>
            <str name="fields">content</str>
            <str name="minTokenLen">10</str>
            <str name="quantRate">.2</str>
            <str name="signatureClass">solr.update.processor.TextProfileSignature</str>
        </processor>
        <processor class="solr.LogUpdateProcessorFactory" />
        <processor class="solr.RunUpdateProcessorFactory" />
    </updateRequestProcessorChain>

    <requestHandler name="/admin/"
        class="org.apache.solr.handler.admin.AdminHandlers" />
    <admin>
        <defaultQuery>*:*</defaultQuery>
    </admin>

</config>

Upvotes: 0

Views: 4065

Answers (1)

Grooveek
Grooveek

Reputation: 10094

Multipart files are limited in size in Solr config of the ExtractingRequestHandler

You should modify the value which seems to be 2048KB by default Look at <requestParsers enableRemoteStreaming="false" multipartUploadLimitInKB="2048" /> in the <requestDispatcher...> section

In your example, you'll have to put this in your solrconfig.xml :

<requestDispatcher handleSelect="false" >
    <requestParsers enableRemoteStreaming="true"
                multipartUploadLimitInKB="2048000" <-- set your size Here
                formdataUploadLimitInKB="2048"
                addHttpRequestToContext="false"/>
</requestDispatcher>

Upvotes: 1

Related Questions