Reputation: 628
I have two documents. One document contains the name of the person, corresponding rank and the doc id, this document is in csv format. Screenshot for the same is below.
The other set of documents contains paragraphs. Here is the screenshot of an ohter set of documents, these documents are named as doc id and is in text format.
I need to insert these two as one doc in solr such that in solr I have a doc of format :
Person: arthur w cabot
KDE Rank: 5.98+108
Text: Text from the other set of documents
How can I achieve this. Also, I would like to know if there is other approach that I can follow?
Upvotes: 1
Views: 221
Reputation: 8658
In your case you can build the solr document and commit it to solr. Something like below :
SolrInputDocument document = new SolrInputDocument();
document.addField("id", "123456");
document.addField("title", fileName);
document.addField("text", contentBuilder.toString());
solr.add(document);
solr.commit();
In your case the fields are personName and personRank and the documentContent. I assume that the reading of the csv file would be done from your end and you will retrieve the document name and you already know where the document is located.
As mentioned you can read the csv file, you will the data for the personName an PersonRank directly.
The third is about the field document content. As you only get the document file name, you can read the content of the document and pass it to the solr document as the third field.
I have done one option for you. Something like below :
String urlString = "http://localhost:8983/solr/TestCore";
SolrClient solr = new HttpSolrClient.Builder(urlString).build();
StringBuilder contentBuilder = new StringBuilder();
try (Stream<String> stream = Files.lines(Paths.get("D:/LogFolder/IB4_buildViewSchema.txt"),
StandardCharsets.UTF_8)) {
stream.forEach(s -> contentBuilder.append(s).append("\n"));
} catch (IOException e) {
e.printStackTrace();
}
try {
File file = new File("D:/LogFolder/IB4_buildViewSchema.txt");
String fileName = file.getName();
SolrInputDocument document = new SolrInputDocument();
document.addField("id", "123456");
document.addField("title", fileName);
document.addField("text", contentBuilder.toString());
solr.add(document);
solr.commit();
} catch (SolrServerException | IOException e) {
e.printStackTrace();
}
This will go in iterative mode for all the data of the csv.
Check if you can do it batches and you need to look for the optimizing the code as well. This code is not a full proof solution for your problem.
I verified if the data is indexed in solr by querying it to solr by solr admin page. Please refer the image below :
Note : I build a maven project and written the above piece of code. If you want you can use the below pom.xml for your reference.
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>solr</groupId>
<artifactId>TestSolr2</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>TestSolr2</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.target>1.8</maven.compiler.target>
<maven.compiler.source>1.8</maven.compiler.source>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.solr</groupId>
<artifactId>solr-solrj</artifactId>
<version>7.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.solr</groupId>
<artifactId>solr-cell</artifactId>
<version>7.6.0</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
</dependencies>
</project>
Upvotes: 3