Reputation: 3123
In my language there are composite or compound letters, which consists of more than one character, eg "ty", "ny" and even "tty" and "nny". I would like to write a Ruby method (spell) which tokenize words into letters, according to this alphabet:
abc=[*%w{tty ccs lly ggy ssz nny dzs zzs sz zs cs gy ny dz ty ly q w r t z p l k j h g f d s x c v b n m y}.map{|z| [z,"c"]},*"eéuioöüóőúűáía".split(//).map{|z| [z,"v"]}].to_h
The resulting hash keys shows the existing letters / composite letters of the alphabet and also shows which letter is a consonant ("c") and which one is a vowel ("v"), becase later I would like to use this hash to decompose words into syllables. Cases of compound words when accidentally composite letters are formed at the words common boundary shoudn't be resolved by the method of course.
Examples:
spell("csobolyó") => [ "cs", "o", "b", "o", "ly", "ó" ]
spell("nyirettyű") => [ "ny", "i", "r", "e", "tty", "ű" ]
spell("dzsesszmuzsikus") => [ "dzs", "e", "ssz", "m", "u", "zs", "i", "k", "u", "s" ]
Upvotes: 2
Views: 347
Reputation: 3123
Meanwhile I managed to write a method which works, but 5x slower than String#scan:
abc=[*%w{tty ccs lly ggy ssz nny dzs zzs sz zs cs gy ny dz ty ly q w r t z p l k j h g f d s x c v b n m y}.map{|z| [z,"c"]},*"eéuioöüóőúűáía".split(//).map{|z| [z,"v"]}].to_h
def spell(w,abc)
s=w.split(//)
p=""
t=[]
for i in 0..s.size-1 do
p << s[i]
if i>=s.size-2 then
if abc[p]!=nil then
t.push p
p=""
elsif abc[p[0..-2]]!=nil then
t.push p[0..-2]
p=p[-1]
elsif abc[p[0]]!=nil then
t.push p[0]
p=p[1..-1]
end
elsif p.size==3 then
if abc[p]!=nil then
t.push p
p=""
elsif abc[p[0..-2]]!=nil then
t.push p[0..-2]
p=p[-1]
elsif abc[p[0]]!=nil then
t.push p[0]
p=p[1..-1]
end
end
end
if p.size>0 then
if abc[p]!=nil then
t.push p
p=""
elsif abc[p[0..-2]]!=nil then
t.push p[0..-2]
p=p[-1]
end
end
if p.size>0 then
t.push p
end
return t
end
Upvotes: 0
Reputation: 11035
You might be able to get started looking at String#scan
, which appears to be giving decent results for your examples:
"csobolyó".scan(Regexp.union(abc.keys))
# => ["cs", "o", "b", "o", "ly", "ó"]
"nyirettyű".scan(Regexp.union(abc.keys))
# => ["ny", "i", "r", "e", "tty", "ű"]
"dzsesszmuzsikus".scan(Regexp.union(abc.keys))
# => ["dzs", "e", "ssz", "m", "u", "zs", "i", "k", "u", "s"]
The last case doesn't match your expected output, but it matches your statement in the comments
I sorted the letters in the alphabet: if a letter appears earlier, then it should be recognized instead of its simple letters. When a word contains "dzs" it should be considered to "dzs" and not to "d" and "zs"
Upvotes: 2
Reputation: 2333
I didn't use the preference in which you sorted, rather I used higher character word will have higher preference than lower character word.
def spell word
abc=[*%w{tty ccs lly ggy ssz nny dzs zzs sz zs cs gy ny dz ty ly q w r t z p l k j h g f d s x c v b n m y}.map{|z| [z,"c"]},*"eéuioöüóőúűáía".split(//).map{|z| [z,"v"]}].to_h
current_position = 0
maximum_current_position = 2
maximum_possible_position = word.length
split_word = []
while current_position < maximum_possible_position do
current_word = set_current_word word, current_position, maximum_current_position
if abc[current_word] != nil
current_position, maximum_current_position = update_current_position_and_max_current_position current_position, maximum_current_position
split_word.push(current_word)
else
maximum_current_position = update_max_current_position maximum_current_position
current_word = set_current_word word, current_position, maximum_current_position
if abc[current_word] != nil
current_position, maximum_current_position = update_current_position_and_max_current_position current_position, maximum_current_position
split_word.push(current_word)
else
maximum_current_position = update_max_current_position maximum_current_position
current_word = set_current_word word, current_position, maximum_current_position
if abc[current_word] != nil
current_position, maximum_current_position = update_current_position_and_max_current_position current_position, maximum_current_position
split_word.push(current_word)
else
puts 'This word cannot be formed in the current language'
break
end
end
end
end
split_word
end
def update_max_current_position max_current_position
max_current_position = max_current_position - 1
end
def update_current_position_and_max_current_position current_position,max_current_position
current_position = max_current_position + 1
max_current_position = current_position + 2
return current_position, max_current_position
end
def set_current_word word, current_position, max_current_position
word[current_position..max_current_position]
end
puts "csobolyó => #{spell("csobolyó")}"
puts "nyirettyű => #{spell("nyirettyű")}"
puts "dzsesszmuzsikus => #{spell("dzsesszmuzsikus")}"
Output
csobolyó => ["cs", "o", "b", "o", "ly", "ó"]
nyirettyű => ["ny", "i", "r", "e", "tty", "ű"]
dzsesszmuzsikus => ["dzs", "e", "ssz", "m", "u", "zs", "i", "k", "u", "s"]
Upvotes: 1