vach
vach

Reputation: 11377

Hyphenation for different languages with java

The problem : Given a string (which can be in different language) we have to hyphenate it.

What i tried : hypenator-j but this seems to be working only for English, I'm not sure how to hyphenate other languages, couldn't find free tex files for different languages.

What options do we have for solving hyphenation for different languages in java?

Upvotes: 4

Views: 962

Answers (2)

rzo1
rzo1

Reputation: 5751

The implementation of the hyphenator-j or of a forked variant is able to use the original .tex hyphenation tables.

This tables can either be found

  • On your local machine, if you have already installed a TeX environment such as MiKTeX. In this case, the .tex hyphenation tables can be found in \tex\generic\hyphen
  • On the Web page of the TeX User Group and the corresponding Git: here

Once you obtained the .tex of your interest, you can load them using the API provided by hyphenator-j.

Upvotes: 4

Dylan Meeus
Dylan Meeus

Reputation: 5802

Given enough time and willpower you could implement hyphenation yourself based on this thesis for example http://www.tug.org/docs/liang/. Implementing hyphenation yourself is not an easy task though, so you might want to opt for alternate solutions.

Hyphenator.js

Yes, this is a javascript project. However it is possible to call javascript functions from java. You can find more information of this here: http://docs.oracle.com/javase/6/docs/technotes/guides/scripting/programmer_guide/index.html.

This offers support for a wide variety of languages.

Scrape dictionaries

Many dictionaries offer hyphenation rules. You can find these online though it will involve some searching. Next you can scrape these for the hyphenation rules, but this might be an uglier workaround than calling javascript from Java.

Either way, hyphenation is not an easy problem, implementing it yourself seems like quite an annoying task so maybe the javascript project is your best bet. OR, you could implement your own Java implementation based on hyphenator.js. At least you would not start from scratch then.

Upvotes: 2

Related Questions