Konstantin
Konstantin

Reputation: 3123

Decompose words into letters with Ruby

In my language there are composite or compound letters, which consists of more than one character, eg "ty", "ny" and even "tty" and "nny". I would like to write a Ruby method (spell) which tokenize words into letters, according to this alphabet:

abc=[*%w{tty ccs lly ggy ssz nny dzs zzs sz zs cs gy ny dz ty ly q w r t z p l k j h g f d s x c v b n m y}.map{|z| [z,"c"]},*"eéuioöüóőúűáía".split(//).map{|z| [z,"v"]}].to_h

The resulting hash keys shows the existing letters / composite letters of the alphabet and also shows which letter is a consonant ("c") and which one is a vowel ("v"), becase later I would like to use this hash to decompose words into syllables. Cases of compound words when accidentally composite letters are formed at the words common boundary shoudn't be resolved by the method of course.

Examples:

spell("csobolyó") => [ "cs", "o", "b", "o", "ly", "ó" ]
spell("nyirettyű") => [ "ny", "i", "r", "e", "tty", "ű" ]
spell("dzsesszmuzsikus") => [ "dzs", "e", "ssz", "m", "u", "zs", "i", "k", "u", "s" ]

Upvotes: 2

Views: 347

Answers (3)

Konstantin
Konstantin

Reputation: 3123

Meanwhile I managed to write a method which works, but 5x slower than String#scan:

abc=[*%w{tty ccs lly ggy ssz nny dzs zzs sz zs cs gy ny dz ty ly q w r t z p l k j h g f d s x c v b n m y}.map{|z| [z,"c"]},*"eéuioöüóőúűáía".split(//).map{|z| [z,"v"]}].to_h

def spell(w,abc)


    s=w.split(//)
    p=""
    t=[]

    for i in 0..s.size-1 do
      p << s[i]
      if i>=s.size-2 then

       if abc[p]!=nil then
          t.push p
          p=""

       elsif abc[p[0..-2]]!=nil then
          t.push p[0..-2]
          p=p[-1]

       elsif abc[p[0]]!=nil then
          t.push p[0]
          p=p[1..-1]

       end 

      elsif p.size==3 then
       if abc[p]!=nil then
          t.push p
          p=""

       elsif abc[p[0..-2]]!=nil then
          t.push p[0..-2]
          p=p[-1]

       elsif abc[p[0]]!=nil then
          t.push p[0]
          p=p[1..-1]
       end
      end
    end

    if p.size>0 then
        if abc[p]!=nil then
          t.push p
          p=""

       elsif abc[p[0..-2]]!=nil then
          t.push p[0..-2]
          p=p[-1]
      end
    end

    if p.size>0 then
      t.push p
    end
    return t
end

Upvotes: 0

Simple Lime
Simple Lime

Reputation: 11035

You might be able to get started looking at String#scan, which appears to be giving decent results for your examples:

"csobolyó".scan(Regexp.union(abc.keys))
# => ["cs", "o", "b", "o", "ly", "ó"]
"nyirettyű".scan(Regexp.union(abc.keys))
# => ["ny", "i", "r", "e", "tty", "ű"]
"dzsesszmuzsikus".scan(Regexp.union(abc.keys))
# => ["dzs", "e", "ssz", "m", "u", "zs", "i", "k", "u", "s"]

The last case doesn't match your expected output, but it matches your statement in the comments

I sorted the letters in the alphabet: if a letter appears earlier, then it should be recognized instead of its simple letters. When a word contains "dzs" it should be considered to "dzs" and not to "d" and "zs"

Upvotes: 2

Siva Praveen
Siva Praveen

Reputation: 2333

I didn't use the preference in which you sorted, rather I used higher character word will have higher preference than lower character word.

def spell word
  abc=[*%w{tty ccs lly ggy ssz nny dzs zzs sz zs cs gy ny dz ty ly q w r t z p l k j h g f d s x c v b n m y}.map{|z| [z,"c"]},*"eéuioöüóőúűáía".split(//).map{|z| [z,"v"]}].to_h
  current_position = 0
  maximum_current_position = 2
  maximum_possible_position = word.length
  split_word = []
  while current_position < maximum_possible_position do 
    current_word = set_current_word word, current_position, maximum_current_position
    if abc[current_word] != nil
      current_position, maximum_current_position = update_current_position_and_max_current_position current_position, maximum_current_position
      split_word.push(current_word)
    else
      maximum_current_position = update_max_current_position maximum_current_position
      current_word = set_current_word word, current_position, maximum_current_position
      if abc[current_word] != nil
        current_position, maximum_current_position = update_current_position_and_max_current_position current_position, maximum_current_position
        split_word.push(current_word)
      else
        maximum_current_position = update_max_current_position maximum_current_position
        current_word = set_current_word word, current_position, maximum_current_position
        if abc[current_word] != nil
          current_position, maximum_current_position = update_current_position_and_max_current_position current_position, maximum_current_position          
          split_word.push(current_word)
        else
          puts 'This word cannot be formed in the current language'
          break
        end
      end
    end
  end
  split_word
end

def update_max_current_position max_current_position
    max_current_position = max_current_position - 1
end

def update_current_position_and_max_current_position current_position,max_current_position
    current_position = max_current_position + 1
    max_current_position = current_position + 2
    return current_position, max_current_position
end

def set_current_word word, current_position, max_current_position
  word[current_position..max_current_position]
end

puts "csobolyó => #{spell("csobolyó")}"
puts "nyirettyű => #{spell("nyirettyű")}"
puts "dzsesszmuzsikus => #{spell("dzsesszmuzsikus")}"

Output

csobolyó => ["cs", "o", "b", "o", "ly", "ó"]
nyirettyű => ["ny", "i", "r", "e", "tty", "ű"]
dzsesszmuzsikus => ["dzs", "e", "ssz", "m", "u", "zs", "i", "k", "u", "s"]

Upvotes: 1

Related Questions