Reputation: 2118
I'm working on a html tool to study ancient latin language.
There is one exercise where student have to click on some single word,
in which there is a div
with a piece of latin:
<div class="clickable">
Cum a Romanis copiis vincĭtur măr, Gallia terra fera est.
Regionis incŏlae terram non colunt, autem sagittis feras necant et postea eas vorant.
Etiam a_femĭnis vita agrestis agĭtur,
miseras vestes induunt et cum familiā in parvis casis vivunt.
Vita secūra nimiaeque divitiae a Gallis contemnuntur.
Gallorum civitates acrĭter pugnant et ab inimicis copiis timentur.
Galli densis silvis defenduntur, tamen Roma feram Galliam capit.
</div>
In my javascript I wrap all single words into a <span>
with a regex, and I apply some actions.
var words = $('div.clickable');
words.html(function(index, oldHtml) {
var myText = oldHtml.replace(/\b(\w+?)\b/g, '<span class="word">$1</span>')
return myText;
}).click(function(event) {
if(!$(event.target).hasClass("word"))return;
alert($(event.target).text());
}
The problem is that the words that contains ĭ, ŏ, ā
, are not wrapped correctly, but are divided in correspondence of these characters.
How I can match correctly this class of words?
Upvotes: 1
Views: 665
Reputation: 6837
You can split your text by divider. In common case it may be space or different punctuation marks:
(.+?)([\s,.!?;:)([\]]+)
https://regex101.com/r/xW4pF1/5
Edit
var words = $('div.clickable');
words.html(function(index, oldHtml) {
var myText = oldHtml.replace(/(.+?)([\s,.!?;:)([\]]+)/g, '<span class="word">$1</span>$2')
return myText;
}).click(function(event) {
if(!$(event.target).hasClass("word"))return;
alert($(event.target).text());
}
https://jsfiddle.net/s568c0pp/3/
Upvotes: 4
Reputation: 61
The \w
meta character is used to find a word character from a-z
, A-Z
, 0-9
, including the _
(underscore) character.
So you need to change your regex to use the range of Unicode symbols instead of \w
.
You also can try \p{L}
instead of \w
to match any Unicode character.
See also: http://www.regular-expressions.info/unicode.html
Upvotes: 1