Reputation: 465
I am using solr search engine for my project purpose in document retrival. My dataset is in .txt file format. But solr gives options for json,xml,pdf and some other file formats only. There is no option for text files.
Do I need some modifications in solr for using .txt files as dataset?
Upvotes: 1
Views: 1410
Reputation: 2193
I found a very useful line in the quickstart guide https://lucene.apache.org/solr/5_3_1/quickstart.html
java -classpath /solr-5.0.0/dist/solr-core-5.0.0.jar -Dauto=yes
-Dc=gettingstarted -Ddata=files -Drecursive=yes org.apache.solr.util.SimplePostTool docs/
The part that is especially useful for me is -Dauto=yes
. When this option is turned on, Solr can handle many type of files (don't ask me why)
Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
All I know is that I turned that option on, and now my instance will accept pdf, xml and txt files.
Upvotes: 0
Reputation: 73
You can use the CSV request Handler to take care of this. https://wiki.apache.org/solr/UpdateCSV Here, you can configure the delimiters and escape characters. For eg: if you have a "|" delimited file, you can specify "&separator=|"
Below is for Indexing a tab limited text file:
curl 'http://localhost:8983/solr/update/csv?commit=true&separator=%09&escape=\&stream.file=/tmp/result.txt'
Upvotes: 0
Reputation: 684
Apart from txt files, Solr can also index several other document formats. Take a look at Apache Tika for details.
Upvotes: 0
Reputation: 968
Most probably you will be having space separated documents in .txt files.So to index .txt file you can write python script to stream your documents to solr and perform a commit.
Upvotes: 0
Reputation: 9320
All you need to do - is to index your txt file.
For more info and concrete examples take a look here - http://www.slideshare.net/LucidImagination/indexing-text-and-html-files-with-solr-4063407
Upvotes: 0