Siddharth Sinha
Siddharth Sinha

Reputation: 628

How to insert two docs to solr as one document

I have two documents. One document contains the name of the person, corresponding rank and the doc id, this document is in csv format. Screenshot for the same is below. enter image description here

The other set of documents contains paragraphs. Here is the screenshot of an ohter set of documents, these documents are named as doc id and is in text format. enter image description here

I need to insert these two as one doc in solr such that in solr I have a doc of format :

Person: arthur w cabot
KDE Rank: 5.98+108
Text: Text from the other set of documents

How can I achieve this. Also, I would like to know if there is other approach that I can follow?

Upvotes: 1

Views: 221

Answers (1)

Abhijit Bashetti
Abhijit Bashetti

Reputation: 8658

In your case you can build the solr document and commit it to solr. Something like below :

SolrInputDocument document = new SolrInputDocument();
document.addField("id", "123456");
document.addField("title", fileName);
document.addField("text", contentBuilder.toString());
solr.add(document);
solr.commit();

In your case the fields are personName and personRank and the documentContent. I assume that the reading of the csv file would be done from your end and you will retrieve the document name and you already know where the document is located.

As mentioned you can read the csv file, you will the data for the personName an PersonRank directly.

The third is about the field document content. As you only get the document file name, you can read the content of the document and pass it to the solr document as the third field.

I have done one option for you. Something like below :

String urlString = "http://localhost:8983/solr/TestCore";
SolrClient solr = new HttpSolrClient.Builder(urlString).build();

StringBuilder contentBuilder = new StringBuilder();
try (Stream<String> stream = Files.lines(Paths.get("D:/LogFolder/IB4_buildViewSchema.txt"),
StandardCharsets.UTF_8)) {
  stream.forEach(s -> contentBuilder.append(s).append("\n"));
  } catch (IOException e) {
    e.printStackTrace();
  }

try {
    File file = new File("D:/LogFolder/IB4_buildViewSchema.txt");
    String fileName = file.getName();
    SolrInputDocument document = new SolrInputDocument();
    document.addField("id", "123456");
    document.addField("title", fileName);
    document.addField("text", contentBuilder.toString());
    solr.add(document);
    solr.commit();
} catch (SolrServerException | IOException e) {
    e.printStackTrace();
}

This will go in iterative mode for all the data of the csv.

Check if you can do it batches and you need to look for the optimizing the code as well. This code is not a full proof solution for your problem.

I verified if the data is indexed in solr by querying it to solr by solr admin page. Please refer the image below :

Solr Admin Page

Note : I build a maven project and written the above piece of code. If you want you can use the below pom.xml for your reference.

<project xmlns="http://maven.apache.org/POM/4.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>solr</groupId>
    <artifactId>TestSolr2</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <packaging>jar</packaging>

    <name>TestSolr2</name>
    <url>http://maven.apache.org</url>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.target>1.8</maven.compiler.target>
        <maven.compiler.source>1.8</maven.compiler.source>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.apache.solr</groupId>
            <artifactId>solr-solrj</artifactId>
            <version>7.6.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.solr</groupId>
            <artifactId>solr-cell</artifactId>
            <version>7.6.0</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>3.8.1</version>
            <scope>test</scope>
        </dependency>
    </dependencies>
</project>

Upvotes: 3

Related Questions