What does "documents" and "cores" mean in SOLR? And how can i use them?

Question

Based on a relational database analogy i would like to know how Solr fits into place. Based on what i figured so far, "documents" in Solr are similar to "rows" in sql (if my sql table has 100 rows i need to insert 100 documents in solr) and "cores" are similar to "tables" (or databases?!?).

The questions are: If i have 2 sets of unrelated information, let's say a table with car information (id, name, series, color, description) and a table with user information (id, name, address, age, sex), where do i insert these things in Solr? I make 2 cores (core_car, core_user) and populate each of them with documents from the coresponding table? Or i make 1 core (core_general) and insert all the documents from both tables there (separated somehow which i don't know how).

In the first case with 2 cores i am feeling like i am creating 2 databases with 1 table in each (overkill). In the second i am feeling like i am creating 1 table with all the unrelated fields mushed together (this would't be the case if there was some form of separation - that i don't know of at the moment).

Please confirm or not my presumptions. Thank you in advance.

phanin · Accepted Answer

Great that you explored before posting the question. Here's my opinion.

Solr Document: Probably a more suitable way of perceiving this concept is thinking in terms of results. Each Solr document is nothing but one result entry in your result set after executing a search query.

If you were indexing Wikipedia, each article would be a Solr Document. When you search for "sorting algorithms", the results you would like to see are "bubble sort", "merge sort", etc. Each of them is an article, a Solr document, and a result in the result-set.

If you want you relate this back to rdbms concept, I would like to say that each search-result (i.e. a Solr Document) could be a row in the result-set of a sql-query. That row could be a row from a single table, or a row from JOINed tables.

Solr Core is nothing but a wrapper around ONE lucene Index. Each Solr web-app can operate multiple Solr Cores.

The best way to speed up your understanding is to avoid relating concepts in Solr to RDBMS.

Explore What Solr offers that RDMBS doesn't (efficiently)

Here's another link that might help you : Solr Terminology

Your use-case

The beauty of Solr/Lucene is flexible schema or I'd say no schema. Each document can have totally different fields and attributes from the previous document indexed.

It is perfectly fine to have different types of documents (car, person, etc) in the same lucene index (Solr Core in your case), as long as they are scalable altogether.

For example, if you have 500M car entries and 3 billion person entries, it makes sense to index them separately. If you have 1mn Persons and 500k cars, you can stuff all of them in the same index with an identifier field containing entity type.

Your question is very subjective. Not everyone would agree with what I said. It depends on a lot more factors to decide between one core or multiple cores.

For example,

do those two entities (persons and cars) complement each other to serve as a logical chunk in order to support a product feature?
Are there any situations where you'd have to get both types of results for a query.
How often you update each type of entity. (There's no update option in Solr. It's only delete & re-add.)
Do they belong in different product features?
Are they owned by different teams, etc..

What does "documents" and "cores" mean in SOLR? And how can i use them?

Answers (1)

Related Questions

What does &quot;documents&quot; and &quot;cores&quot; mean in SOLR? And how can i use them?

Answers (1)

Related Questions

What does "documents" and "cores" mean in SOLR? And how can i use them?