user779712
user779712

Reputation: 23

Java - Phonetical transcription - Convert one format to SAMPA one

I have a string which is the phonetical transcription of a text, in what is called the lia_phon format (french phonemizer). The string looks like something like this:

ttoujjourr

This string is the phonetical transcription of the french word "toujours" (means always).

What I want to do is to convert this string into the SAMPA format, given the list of the equivalence between lia_phon phonems, and sampa ones.

So for instance, we have:

(LIA_phon, SAMPA)

tt, t

ou, u

jj, Z

rr, R

So, the word "toujours", in the SAMPA format is tuZuR.

I'd like to convert the word automatically from Java. Any idea on how to do it? I'm working for the TTS system Mary TTS, which work exclusively with SAMPA phonems.

Thanks a lot,

Emma

Upvotes: 2

Views: 923

Answers (2)

Asaph
Asaph

Reputation: 162811

Assuming the LIA_phon phonemes are always 2 characters long, you could create a simple Map to store the conversions. Then you could write a method that iterates through a LIA_phon input string 2 characters at a time and looks up the 2 character phonemes in your map and appends them to a StringBuilder instance. Below, I've written an implementation and confirmed it works with a unit test (also included below).

LiaPhon.java

import java.util.HashMap;
import java.util.Map;

public class LiaPhon {
    private final static Map<String,String> LIA_PHONE_TO_SAMPA = new HashMap<String,String>();
    static {
        LIA_PHONE_TO_SAMPA.put("tt", "t");
        LIA_PHONE_TO_SAMPA.put("ou", "u");
        LIA_PHONE_TO_SAMPA.put("jj", "Z");
        LIA_PHONE_TO_SAMPA.put("rr", "R");
        // etc.
    }

    public static String liaPhone2SAMPA(String liaPhon) {
         int length = liaPhon.length();
         if (length % 2 != 0) {
             throw new IllegalArgumentException("LIA_phon must contain an even number of characters!");
         }
         StringBuilder sampa = new StringBuilder();
         for (int i=0; i<length; i+=2) {
             String liaPhonPhoneme = liaPhon.substring(i, i+2);
             String sampaPhoneme = LIA_PHONE_TO_SAMPA.get(liaPhonPhoneme);
             if (sampaPhoneme == null) {
                 throw new IllegalArgumentException("Unrecognized LIA_phon phoneme: " + liaPhonPhoneme);
             }
             sampa.append(sampaPhoneme);
         }
         return sampa.toString();
    }
}

LiaPhonTest.java

import static org.junit.Assert.*;

import org.junit.Test;

public class LiaPhonTest {
    @Test
    public void testLiaPhone2SAMPA() {
        assertEquals("tuZuR", LiaPhon.liaPhone2SAMPA("ttoujjourr"));
    }

    @Test(expected=IllegalArgumentException.class)
    public void testLiaPhone2SAMPAWithOddNumberOfLetters() {
        LiaPhon.liaPhone2SAMPA("ttoujjour");
    }   

    @Test(expected=IllegalArgumentException.class)
    public void testLiaPhone2SAMPAWithInvalidPhoneme() {
        LiaPhon.liaPhone2SAMPA("ttoujj$$ourr");
    }   
}

Upvotes: 1

I82Much
I82Much

Reputation: 27326

Sounds like a fairly straightforward string replace operation.

public static Map<String, String> liaToSampa = new HashMap<String, String>();
static {
liaToSampa.put("tt", "t");
liaToSampa.out("ou","u");
liatoSampa.put("jj","Z");
liaToSampa.put("rr","R");
}
// etc

public static String translateLiaToSampa(String liaWord) {
   String result = liaWord;
   for (Map.Entry<String, String> entry : liaToSampa.entrySet()) {
       String liaPhoneme = entry.getKey();
       String sampaPhoneme = entry.getValue();
       result = result.replaceAll(liaPhoneme, sampaPhoneme);
   }
   return result;
}

Upvotes: 0

Related Questions