SamiSalami
SamiSalami

Reputation: 667

Replacing umlauts in JS

I am comparing strings and have to replace umlauts in JS, but it seems JS does not recognize the umlauts in the strings. The text comes from the database and in the browser the umlauts do show fine.

function replaceUmlauts(string)
{
    value = string.toLowerCase();
    value = value.replace(/ä/g, 'ae');
    value = value.replace(/ö/g, 'oe');
    value = value.replace(/ü/g, 'ue');
    return value;
}

As search patterns I tried:

To be sure, that it is not a matter with the replace function I tried indexOf:

console.log(value.indexOf('ä'));

But the output with all patterns is: -1

So I guess it is some kind of a problem with encoding, but as I said on the page the umlauts do just look fine.

Any ideas? This seems so simple...

EDIT: Even if I found my answer, the problem was not really solved "at the root" (the encoding). This is my page encoding:

<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

The database has: utf8_general_ci

Seems totally alright to me.

Upvotes: 27

Views: 81830

Answers (5)

mfrosch
mfrosch

Reputation: 141

If you need a little snippet to convert german umlauts to html special charactes, here you go:

function fixUmlauts(value) {
    value = value.replace(/ä/g, '&auml;');
    value = value.replace(/ö/g, '&ouml;');
    value = value.replace(/ü/g, '&uuml;');
    value = value.replace(/ß/g, '&szlig;');
    value = value.replace(/Ä/g, '&Auml;');
    value = value.replace(/Ö/g, '&Ouml;');
    value = value.replace(/Ü/g, '&Uuml;');
    return value;
}

Upvotes: 4

Andreas Richter
Andreas Richter

Reputation: 788

If you are looking to replace the German Umlaute with cleverly respecting the case, use this (opensource, happy to share, all by me):

const umlautMap = {
  '\u00dc': 'UE',
  '\u00c4': 'AE',
  '\u00d6': 'OE',
  '\u00fc': 'ue',
  '\u00e4': 'ae',
  '\u00f6': 'oe',
  '\u00df': 'ss',
}

function replaceUmlaute(str) {
  return str
    .replace(/[\u00dc|\u00c4|\u00d6][a-z]/g, (a) => {
      const big = umlautMap[a.slice(0, 1)];
      return big.charAt(0) + big.charAt(1).toLowerCase() + a.slice(1);
    })
    .replace(new RegExp('['+Object.keys(umlautMap).join('|')+']',"g"),
      (a) => umlautMap[a]
    );
}

const test = ['Übung', 'ÜBUNG', 'üben', 'einüben', 'EINÜBEN', 'Öde ätzende scheiß Übung']
test.forEach((str) => console.log(str + " -> " + replaceUmlaute(str)))

It will:

  • Übung -> Uebung
  • ÜBUNG -> UEBUNG
  • üben -> ueben
  • einüben -> einueben
  • EINÜBEN -> EINUEBEN
  • and the same for Ä, Ö
  • and simple ß -> ss

Upvotes: 29

Oleg V. Volkov
Oleg V. Volkov

Reputation: 22421

Either ensure that your script's encoding is correctly specified (in <script> tag or in page's header/meta if it's embedded) or specify symbols with \uNNNN syntax that will always unambiguously resolve to some specific Unicode codepoint.

For example:

str.replace(/\u00e4/g, "ae")

Will always replace ä with ae, no matter what encoding is set for your page/script, even if it is incorrect.

Here are the codes needed for Germanic languages:

// Ü, ü     \u00dc, \u00fc
// Ä, ä     \u00c4, \u00e4
// Ö, ö     \u00d6, \u00f6
// ß        \u00df

Upvotes: 62

Fidel Gonzo
Fidel Gonzo

Reputation: 574

Here's a function that replaces most common chars to produce a Google friendly SEO url:

function deUmlaut(value){
  value = value.toLowerCase();
  value = value.replace(/ä/g, 'ae');
  value = value.replace(/ö/g, 'oe');
  value = value.replace(/ü/g, 'ue');
  value = value.replace(/ß/g, 'ss');
  value = value.replace(/ /g, '-');
  value = value.replace(/\./g, '');
  value = value.replace(/,/g, '');
  value = value.replace(/\(/g, '');
  value = value.replace(/\)/g, '');
  return value;
}

Upvotes: 14

Larry K
Larry K

Reputation: 49114

You need to first figure out what the character codes are that you're trying to replace. For example, depending on the character encoding, the characters could be in 8859, UTF-8 or something else. They could also be character symbols such as "ä"

Rather than guessing, print them out.

And beware that your incoming data may not use the same character set/character encoding consistently--you need to check on where the data is coming from.

So look at the incoming data by using string. charCodeAt

Check the character code before the toLowerCase to ensure that it is not changing things on you. You'll need to debug step by step.

Finally, check the character set settings in your editor to ensure that your typed ä is what it should be. You may want to specify it via the UTF8 value rather than typing ä, ö etc

Upvotes: 2

Related Questions