Leo
Leo

Reputation: 5783

Capitalize a cyrillic strings with JavaScript

I'm making an AngularJS filter which capitalizes each word's first letter. It works well with a-zA-Z letters, but in my case I use also cyrillic characters and I would like to make it work.

var strLatin = "this is some string";
var strCyrillic = "това е някакъв низ";

var newLatinStr = strLatin.replace(/\b[\wа-яА-Я]/g, function(l){ 
    return l.toUpperCase();
});

var newCyrillicStr = strCyrillic.replace(/\b[\wа-яА-Я]/g, function(l){ 
    return l.toUpperCase();
});

Here I got some CodePen example: http://codepen.io/brankoleone/pen/GNxjRM

Upvotes: 2

Views: 1146

Answers (3)

GilZ
GilZ

Reputation: 6477

If you use Lodash, you can use _.startCase instead of your own implementation (they do it by splitting the string into words, capitalizing the 1st character of each word and then joining them back together)

Upvotes: 1

Gennady Grishkovtsov
Gennady Grishkovtsov

Reputation: 818

Try it:

function capitalizer(string) {
  return string.split(/\s/).map(function(item){
    return (item.charAt(0).toUpperCase() + item.slice(1))
  }).join(' ')
}

Example

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

You need a custom word boundary that you may build using groupings:

var strLatin = "this is some string";
var strCyrillic = "това е някакъв низ";
var block = "\\w\\u0400-\\u04FF";
var rx = new RegExp("([^" + block + "]|^)([" + block + "])", "g");

var newLatinStr = strLatin.replace(rx, function($0, $1, $2){ 
    return $1+$2.toUpperCase();
});
console.log(newLatinStr);
var newCyrillicStr = strCyrillic.replace(rx, function($0, $1, $2){ 
    return $1+$2.toUpperCase();
});
console.log(newCyrillicStr);

Details:

  • The block contains all ASCII letters, digits and underscore and all basic Cyrillic chars from the basic Cyrillic range (if you need more, see Cyrillic script in Unicode ranges Wiki article and update the regex accordingly), perhaps, you just want to match Russian with А-ЯЁёа-я, then use var block = "\\wА-ЯЁёа-я
  • The final regex matches and captures into Group 1 any char other than the one defined in the block or start of string, and then matches and captures into Group 2 any char defined in the block.

Upvotes: 1

Related Questions