Christofer Carlsson
Christofer Carlsson

Reputation: 11

Regex not finding two letter words that include Swedish letters

So I am very new with Regex and I have managed to create a way to check if a specific word exists inside of a string without just being part of another word.

Example: I am looking for the word "banana". banana == true, bananarama == false

This is all fine, however a problem occurs when I am looking for words containing Swedish letters (Å,Ä,Ö) with words containing only two letters.

Example: I am looking for the word "på" in a string looking like this: "på påsk" and it comes back as negative. However if I look for the word "påsk" then it comes back positive. This is the regex I am using:

const doesWordExist = (s, word) => new RegExp('\\b' + word + '\\b', 'i').test(s);
stringOfWords = "Färg på plagg";
console.log(doesWordExist(stringOfWords, "på"))
//Expected result: true
//Actual result: false

However if I were to change the word "på" to a three letter word then it comes back true:

const doesWordExist = (s, word) => new RegExp('\\b' + word + '\\b', 'i').test(s);
stringOfWords = "Färg pås plagg";
console.log(doesWordExist(stringOfWords, "pås"))
//Expected result: true
//Actual result: true

I have been looking around for answers and I have found a few that have similar issues with Swedish letters, none of them really look for only the word in its entirity. Could anyone explain what I am doing wrong?

Upvotes: 0

Views: 319

Answers (1)

logi-kal
logi-kal

Reputation: 7880

The word boundary \b strictly depends on the characters matched by \w, which is a short-hand character class for [A-Za-z0-9_].

For obtaining a similar behaviour you must re-implement its functionality, for example like this:

const swedishCharClass = '[a-zäöå]';
const doesWordExist = (s, word) => new RegExp(
    '(?<!' + swedishCharClass + ')' + word + '(?!' + swedishCharClass + ')', 'i'
).test(s);

console.log(doesWordExist("Färg på plagg",  "på"));  // true
console.log(doesWordExist("Färg pås plagg", "pås")); // true
console.log(doesWordExist("Färg pås plagg", "på"));  // false

For more complex alphabets, I'd suggest you to take a look at Concrete Javascript Regex for Accented Characters (Diacritics).

Upvotes: 1

Related Questions