Lukas Trenz
Lukas Trenz

Reputation: 23

solr /update/extract 404 Not Found

I'm encountering an issue while trying to upload documents to solr via the endpoint /update/extract.

I run solr 8.5.2 and zookeeper 3.5.8 in docker and could index data before via

...
solr.add(solr_documents)

My Setup:

The Filesystem (the django folder is not relevant for the Problem)

enter image description here

The Files in solr

enter image description here

The File in solr-config

enter image description here

I use the docker-compose.yaml (the django image isnt relevant for the problem)

version: "1.0"
services:
  solr:
    build:
      context: solr/.
      dockerfile: Dockerfile
    container_name: aips-solr
    hostname: aips-solr
    ports:
      - 8983:8983
    environment:
      - ZK_HOST=aips-zk:2181
      - SOLR_HOST=aips-solr
    networks:
      - zk-solr
      - solr-django
    restart: unless-stopped
    depends_on:
      - zookeeper
    volumes:
      - ./solr/solr-config:/opt/solr/server/solr/configsets/_default/conf

  zookeeper:
    image: zookeeper:3.5.8
    container_name: aips-zk
    hostname: aips-zk
    ports:
      - 2181:2128
    networks:
      - zk-solr
      - solr-django
    restart: unless-stopped

  django:
    build:
      context: django/.
      dockerfile: Dockerfile
    container_name: django
    hostname: django
    ports:
      - 4000:4000
    depends_on:
      - solr
    volumes:
      - ./django/app:/app
    networks:
      - solr-django

networks:
  zk-solr:
  solr-django:

The Dockerfile contains:

FROM solr:8.5.2

USER root

ADD run_solr_w_ltr.sh ./run_solr_w_ltr.sh
RUN chown solr:solr run_solr_w_ltr.sh
RUN chmod u+x run_solr_w_ltr.sh


RUN chown -R solr:solr /opt/solr/

USER solr

ENTRYPOINT "./run_solr_w_ltr.sh" 

the launch_sorl.sh contains (to copy plugin learning to rank to solr)

#!/bin/sh 
mkdir -p /var/solr/data/lib/
cp dist/solr-ltr-*.jar /var/solr/data/lib/
ls /var/solr/data/lib

solr-foreground -Dsolr.ltr.enabled=true

the launch_solr.sh starts the container with

#!/bin/sh

docker build . -t aips-solr

Solr runs sucessfully and the admin center can be accessed via http://localhost:8983/solr/#/

I followed the instruction of https://solr.apache.org/guide/8_5/uploading-data-with-solr-cell-using-apache-tika.html

I did create an file called solrconfig.xml in the sub folder solr

enter image description here

The contant is:

<lib dir="/opt/solr/contrib/extraction/lib" regex=".*\.jar" />
<lib dir="/opt/solr/dist/" regex="solr-cell-\d.*\.jar" />

<requestHandler name="/update/extract" 
                startup="lazy"
                class="solr.extraction.ExtractingRequestHandler">
   <lst name="defaults">
      <str name="lowernames">true</str>
      <str name="fmap.content">content</str>
   </lst>
</requestHandler>

I checked if the solr folder exists and contains the files.

i created a new index in the solr-admin-center

enter image description here

i should be using the config of the directory

/opt/solr/server/solr/configsets/_default/conf

right ?

I set the volumn via

volumes:
      - ./solr/solr-config:/opt/solr/server/solr/configsets/_default/conf

therefore the config should be the config of solrconfig.xml

<lib dir="/opt/solr/contrib/extraction/lib" regex=".*\.jar" />
<lib dir="/opt/solr/dist/" regex="solr-cell-\d.*\.jar" />

<requestHandler name="/update/extract" 
                startup="lazy"
                class="solr.extraction.ExtractingRequestHandler">
   <lst name="defaults">
      <str name="lowernames">true</str>
      <str name="fmap.content">content</str>
   </lst>
</requestHandler>

right?

The Settings of Parser-Specific Properties are optional if i understand it correct.

If i call the endpoint /update/extract of the collection via the admin center

enter image description here

i get

enter image description here

If i use postmann

enter image description here

with the POST command and the uri: http://localhost:8983/solr/test10/update/extract

and the key Values:

Key Value
extractOnly true
wt json
stream.file Zertifikate.pdf
stream.body xaAgikF464R9gR7Jz7ACA0... (base64 string)

I get also

enter image description here

Same if i use an adjusted curl command like in the docs

curl "http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc6&defaultField=text&commit=true" --data-binary @example/exampledocs/sample.html -H 'Content-type:text/html'

What i tried so far

i change the path of the solr folder to a relativ path

solrconfig.xml

<lib dir="../../../../../solr/contrib/extraction/lib" regex=".*\.jar" />
<lib dir="../../../../../solr/dist/" regex="solr-cell-\d.*\.jar" />

<requestHandler name="/update/extract" 
                startup="lazy"
                class="solr.extraction.ExtractingRequestHandler">
   <lst name="defaults">
      <str name="lowernames">true</str>
      <str name="fmap.content">content</str>
   </lst>
</requestHandler>

I checked if the folder solr contains the the .jars

I checked if i can access the Collection

enter image description here

i checked if the user solr has the right permissions

My setup must be wrong but I can't find any other clues on how to find and solve the error.

Any help or advice would be greatly appreciated.

Based on MatsLindh's comment, I have made the following further changes.

According to the admin interface you're running Solr in in cloud mode - that means that you have to explicitly upload your config set to the running zookeeper instance. See solr.apache.org/guide/solr/latest/deployment-guide/… - you might want to run it as a single instance instance of using the built-in cluster support if you want to just have a single node and supply the configuration on the file system instead. By MatsLindh

I uploaded the confing with the follwing steps

  1. I started docker with
docker-compose up
  1. I uploaded the config via a 2. powershell with the command

docker-compose exec solr solr zk upconfig -n newconfig -d /opt/solr/server/solr/configsets/_default/conf -z zookeeper:2181

This will upload the configuration of the folder. Afterwards the file solrconfig.xml had to be adapted as follows:

<config>
   <luceneMatchVersion>8.5.2</luceneMatchVersion>
   <lib dir="/opt/solr/contrib/extraction/lib" regex=".*\.jar" />
   <lib dir="/opt/solr/dist/" regex="solr-cell-\d.*\.jar" />

   <requestHandler name="/update/extract" 
                  startup="lazy"
                  class="solr.extraction.ExtractingRequestHandler">
      <lst name="defaults">
         <str name="lowernames">true</str>
         <str name="fmap.content">content</str>
      </lst>
   </requestHandler>
</config>

A schema.xml also needed to be created. I used the schema:

<?xml version="1.0" encoding="UTF-8" ?>
<schema>
    <fieldType name="text_general" class="solr.TextField" 
     positionIncrementGap="100"> 
        <analyzer type="index"> 
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true" 
            words="stopwords.txt" />
           <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true" 
            words="stopwords.txt" />
            <filter class="solr.SynonymFilterFactory" 
            synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
            <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
    </fieldType>

    <fields>
        <field name="title" type="text_general" indexed="true" 
        stored="true"/>
        <field name="content" type="text_general" indexed="true" 
        stored="true"/>
    </fields>
</schema>

Because of the schema the two text files synonyms.txt and stopwords.txt had to be created. After the changes my Folderstructure looks like enter image description here After all the changes i get the following error if i try to create a new collection with the configset:enter image description here

Possibly unhandled rejection: {"data":{"responseHeader":{"status":400,"QTime":620},"failure":{"aips-solr:8983_solr":"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error from server at http://aips-solr:8983/solr: Error CREATEing SolrCore 'test_upload_3_shard1_replica_n1': Unable to create core [test_upload_3_shard1_replica_n1] Caused by: null"},"Operation create caused exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Underlying core creation failed while creating collection: test_upload_3","exception":{"msg":"Underlying core creation failed while creating collection: test_upload_3","rspCode":400},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],"msg":"Underlying core creation failed while creating collection: test_upload_3","code":400}},"status":400,"config":{"method":"GET","transformRequest":[null],"transformResponse":[null],"jsonpCallbackParam":"callback","url":"admin/collections","params":{"wt":"json","_":1687760309417,"action":"CREATE","name":"test_upload_3","router.name":"compositeId","numShards":1,"collection.configName":"newconfig","replicationFactor":1,"maxShardsPerNode":1,"autoAddReplicas":"false"},"headers":{"Accept":"application/json, text/plain, /","X-Requested-With":"XMLHttpRequest"},"timeout":10000},"statusText":"Bad Request","xhrStatus":"complete","resource":{}}

I think it has to do with a network or firewall issue. The guess is based on this stackoverflow post Failed to create collection

I will check it this evening on another pc.

Upvotes: 0

Views: 406

Answers (0)

Related Questions