Reputation: 31
I am working on a prototype of a search system.
I have a table in oracle with some fields. I generated data that looks real. Around 300.000 rows. For example:
PaymentNo|Datetime |AmountEuro|PayersName |PayersPhoneNo|ReceiversLegal|ReceiversAcc 2314 |2015-07-21T15:14|15.63 |Clinton, Barack Anjela|1.918.0060657|Nasa |5555569778664190000 230338 |2015-08-01T15:14|34.87 |Merkel, George Donald |1.653.0060658|PepsiCo |7777828443194736000
( actually there are more columns)
The size of table in oracle 62 MB (Toad reports)
I imported table into Solr 5.2.1 (in Windows). The size of index with data is 88 MB (on disk). The size of index without data is 67 MB.
My question is: Can I decrease the size of index?
These options are already tested: Decreasing the amount of indexed table columns. Switching off data storage in Solr. Excluding some part of rows from index.
I need an extra opportunity to decrease a size of an index. Do you know any?
Upvotes: 3
Views: 3924
Reputation: 13402
You can use all the insights provided here. Some additional points I wanted to share.
Solr does duplication of the data for providing the fast search over indexed data. One important thing about solr is, it uses immutable data structure for storing all the data.
You can disable the document level Term Vectors storage if you are not using solr highlighting feature of the solr.
Additionally, Solr uses many different compression techniques for different type of data. It uses bit packing/vint compression for posting lists and numerical values. LZ4 compression for stored fields and term vectors. It uses FST data structure for storing the Term Dictionary. FST is an special implementation of Trie data structure.
Upvotes: 4