Regular expression to split words with accented characters from latin

Question

I'm working on a html tool to study ancient latin language. There is one exercise where student have to click on some single word, in which there is a div with a piece of latin:


                   Cum a Romanis copiis vincĭtur măr, Gallia terra fera est. 
Regionis incŏlae terram non colunt, autem sagittis feras necant et postea eas vorant. 
Etiam a_femĭnis vita agrestis agĭtur, 
miseras vestes induunt et cum familiā in parvis casis vivunt. 
Vita secūra nimiaeque divitiae a Gallis contemnuntur. 
Gallorum civitates acrĭter pugnant et ab inimicis copiis timentur. 
Galli densis silvis defenduntur, tamen Roma feram Galliam capit.

In my javascript I wrap all single words into a with a regex, and I apply some actions.

 var words = $('div.clickable');        
    words.html(function(index, oldHtml) {
        var myText = oldHtml.replace(/\b(\w+?)\b/g, '$1')

        return myText;
    }).click(function(event) { 
        if(!$(event.target).hasClass("word"))return; 
        alert($(event.target).text());
    }

The problem is that the words that contains ĭ, ŏ, ā, are not wrapped correctly, but are divided in correspondence of these characters.

How I can match correctly this class of words?

JS Fiddle

Slavik · Accepted Answer

You can split your text by divider. In common case it may be space or different punctuation marks:

(.+?)([\s,.!?;:)([\]]+)

https://regex101.com/r/xW4pF1/5

Edit

var words = $('div.clickable');        
words.html(function(index, oldHtml) {
    var myText = oldHtml.replace(/(.+?)([\s,.!?;:)([\]]+)/g, '$1$2')

    return myText;
}).click(function(event) { 
    if(!$(event.target).hasClass("word"))return; 
    alert($(event.target).text());
}

https://jsfiddle.net/s568c0pp/3/

Regular expression to split words with accented characters from latin

Answers (2)

Related Questions