Reputation: 1636
I have the following text:
a phrase whith length one, which is "uno"
Using the following dictionary,
1) phrase --- frase
2) a phrase --- una frase
3) one --- uno
4) uno --- one
I'm trying to replace the occurrences of the dictionary items in the text. The desired output is:
[a phrase|una frase] whith length [one|uno], which is "[uno|one]"
I've done this:
text = %(a phrase whith length one, which is "uno")
dictionary.each do |original, translation|
text.gsub! original, "[#{original}|#{translation}]"
end
This snippet outputs the following for each dictionary word:
1) a [phrase|frase] whith length one, which is "uno"
2) a [phrase|frase] whith length one, which is "uno"
3) a [phrase|frase] whith length [one|uno], which is "uno"
3) a [phrase|frase] whith length [one|[uno|one]], which is "[uno|one]"
I see two problems here:
phrase
is being replaced instead of a phrase
. I think that this can be fixed by sorting the dictionary by length, giving priority to longer terms.uno
in [one|uno]
. I thought of using some sort of regular expression list (with Regex::union
), but I don't know how efficient and clean it'll be.Any ideas?
Upvotes: 1
Views: 70
Reputation: 168199
To solve your second problem, you have to replace in a single pass.
Convert the dictionary into a hash with the key-value pairs in the order you mention (sorted by length, perhaps).
dictionary = {
"a phrase" => "[a phrase|una frase]",
"phrase" => "[phrase|frase]",
"one" => "[one|uno]",
"uno" => "[uno|one]",
}
Then replace all in a single pass.
text.gsub(Regexp.union(*dictionary.keys.map{|w| "\b#{w}\b"}), dictionary)
Upvotes: 2