user1213444
user1213444

Reputation: 21

Pattern matching for swedish character

I need a help regarding regular expression.

I have to match string like this: âãa34dc

Pattern that i have used:

\s*[a-zA-Z]+[a-zA-Z_0-9]*\s

but this pattern is not good enough to identify this kind of string e.g. âãa34dc

P.S. âã these are swedish character.

Please help me for find out correct pattern for this kind of string.

Upvotes: 1

Views: 2317

Answers (3)

David Yaw
David Yaw

Reputation: 27874

Do you actually want to restrict it to Swedish characters? In other words, should a German character not match? If so, then you'll probably have to enumerate the whole alphabet, and include that.

If what you really want is to match every alphabetic character, use the regular expression terms for matching all letters.

\w matches any word character, but that includes numbers & some punctuation. That's close, but not exactly what you want for your second term.

For the first term, where you don't want to include numbers, specifying that the character should be a Unicode 'letter' class will work. \p{L} specifies all Unicode characters that are a letter. This includes [a-zA-Z], and all the Swedish characters, and German, and Russian, etc.

Therefore, I think this regular expression is what you want:

\s*[\p{L}][\p{L}_0-9]*\s

If you want to include digits from other character sets, and some other punctuation, then you can use [\w]* for the second term.

Upvotes: 3

Douglas
Douglas

Reputation: 54897

John Machin provides a great answer for this. Adapting his pattern, what you need is probably something similar to: \s*[^\W\d_]\w*\s*

P.S. I removed the + quantifier from your first part. Any subsequent letters would be matched by the subsequent quantified \w.

Upvotes: 0

Royi Namir
Royi Namir

Reputation: 148664

please give a set of rules.

according to your question :

    [X-Ya-zA-Z]{3}[0-9]{2}[a-zA-Z]{2}

Replace X with the first swedish letter

Replace Y with the last swedish letter

Upvotes: 0

Related Questions