Reputation: 23
I have a string which is the phonetical transcription of a text, in what is called the lia_phon format (french phonemizer). The string looks like something like this:
ttoujjourr
This string is the phonetical transcription of the french word "toujours" (means always).
What I want to do is to convert this string into the SAMPA format, given the list of the equivalence between lia_phon phonems, and sampa ones.
So for instance, we have:
(LIA_phon, SAMPA)
tt, t
ou, u
jj, Z
rr, R
So, the word "toujours", in the SAMPA format is tuZuR.
I'd like to convert the word automatically from Java. Any idea on how to do it? I'm working for the TTS system Mary TTS, which work exclusively with SAMPA phonems.
Thanks a lot,
Emma
Upvotes: 2
Views: 923
Reputation: 162811
Assuming the LIA_phon
phonemes are always 2 characters long, you could create a simple Map
to store the conversions. Then you could write a method that iterates through a LIA_phon
input string 2 characters at a time and looks up the 2 character phonemes in your map and appends them to a StringBuilder
instance. Below, I've written an implementation and confirmed it works with a unit test (also included below).
import java.util.HashMap;
import java.util.Map;
public class LiaPhon {
private final static Map<String,String> LIA_PHONE_TO_SAMPA = new HashMap<String,String>();
static {
LIA_PHONE_TO_SAMPA.put("tt", "t");
LIA_PHONE_TO_SAMPA.put("ou", "u");
LIA_PHONE_TO_SAMPA.put("jj", "Z");
LIA_PHONE_TO_SAMPA.put("rr", "R");
// etc.
}
public static String liaPhone2SAMPA(String liaPhon) {
int length = liaPhon.length();
if (length % 2 != 0) {
throw new IllegalArgumentException("LIA_phon must contain an even number of characters!");
}
StringBuilder sampa = new StringBuilder();
for (int i=0; i<length; i+=2) {
String liaPhonPhoneme = liaPhon.substring(i, i+2);
String sampaPhoneme = LIA_PHONE_TO_SAMPA.get(liaPhonPhoneme);
if (sampaPhoneme == null) {
throw new IllegalArgumentException("Unrecognized LIA_phon phoneme: " + liaPhonPhoneme);
}
sampa.append(sampaPhoneme);
}
return sampa.toString();
}
}
import static org.junit.Assert.*;
import org.junit.Test;
public class LiaPhonTest {
@Test
public void testLiaPhone2SAMPA() {
assertEquals("tuZuR", LiaPhon.liaPhone2SAMPA("ttoujjourr"));
}
@Test(expected=IllegalArgumentException.class)
public void testLiaPhone2SAMPAWithOddNumberOfLetters() {
LiaPhon.liaPhone2SAMPA("ttoujjour");
}
@Test(expected=IllegalArgumentException.class)
public void testLiaPhone2SAMPAWithInvalidPhoneme() {
LiaPhon.liaPhone2SAMPA("ttoujj$$ourr");
}
}
Upvotes: 1
Reputation: 27326
Sounds like a fairly straightforward string replace operation.
public static Map<String, String> liaToSampa = new HashMap<String, String>();
static {
liaToSampa.put("tt", "t");
liaToSampa.out("ou","u");
liatoSampa.put("jj","Z");
liaToSampa.put("rr","R");
}
// etc
public static String translateLiaToSampa(String liaWord) {
String result = liaWord;
for (Map.Entry<String, String> entry : liaToSampa.entrySet()) {
String liaPhoneme = entry.getKey();
String sampaPhoneme = entry.getValue();
result = result.replaceAll(liaPhoneme, sampaPhoneme);
}
return result;
}
Upvotes: 0