bneigher
bneigher

Reputation: 848

Languages in Apache Solr

I am looking for a solution to expanding my current Apache Solr (4.x) such that it can be used to support a large amount of languages. I would like to take a multicore approach, and have set up my solr so that it has an english core as well as a japanese core (for starters). To challenge things, I am given n .xml files which contain the data which solr will use to index. So to be clear:

I have n languages and I have n .xml files (one .xml per language). Each .xml file is identical in terms of markups, only the raw text is different.

My issue is that I can't seem to figure out how to post say the english.xml file strictly to the english core and the japanese.xml file strictly to the japanese core, so that when I visit my page at:

www.example.com/us/index.html, I am looking at the english.xml indexed results, and

www.example.com/jp/index.html gives me the japanese.xml indexed results.

There really only needs to be one schema because the different language .xml files are structured identically tagwise, but I duplicated all of them because each schema file will be optimized for it's respective language.

if (TLDR) {

How would I independently post:
english.xml -> core-english
japanese.xml -> core-japanese


Or what would be a better approach that gives me
facet and search independent groups so that I can localize my pages?

}

Obviously I don't want to have n different instance of solr running.

Upvotes: 0

Views: 234

Answers (1)

Aujasvi Chitkara
Aujasvi Chitkara

Reputation: 939

Benjamin, your approach is perfect. Multicore is a great way to do it.

Suppose your server is at IP 10.10.10.10, and solr is running under port 8983, then your multicore should look something like:

10.10.10.10:8983/solr/us

10.10.10.10:8983/solr/jp

10.10.10.10:8983/solr/fr

...and so on

Couple of things to keep in mind:

  • Each core will have its own conf folder in it
  • Inside each conf folder, you will have solrconfig.xml, schema.xml, synonyms.txt and other config files specific to that country
  • Field definition will be different for every country, specified in its schema.xml
  • eg: Title field will be of fieldType text_general for US while text_fr for France

POSTING XML

This is how you will post content of various XML files for different countries:

US:

curl http://10.10.10.10:8983/solr/us/update?commit=true -H "Content-Type: text/xml" --data-binary '<add><doc><field name="id">1</field><field name="title">First Item</field></doc><doc><field name="id">2</field><field name="title">Second Item</field></doc></add>'

FR:

curl http://10.10.10.10:8983/solr/fr/update?commit=true -H "Content-Type: text/xml" --data-binary '<add><doc><field name="id">1</field><field name="title">premier article</field></doc><doc><field name="id">2</field><field name="title">deuxième article</field></doc></add>'

JP:

curl http://10.10.10.10:8983/solr/jp/update?commit=true -H "Content-Type: text/xml" --data-binary '<add><doc><field name="id">1</field><field name="title">最初の項目</field></doc><doc><field name="id">2</field><field name="title">番目の項目</field></doc></add>'

SEARCHING

You can search each country independently by just querying its core:

Search query for US:

http://10.10.10.10:8983/solr/us/select?query=john

Search query for JP:

http://10.10.10.10:8983/solr/jp/select?query=ジョン

Upvotes: 1

Related Questions