membersound
membersound

Reputation: 86687

How to remove any non-alphanumeric characters?

I want to remove any non-alphanumeric character from a string, except for certain ones.

StringUtils.replacePattern(input, "\\p{Alnum}", "");

How can I also exclude those certain characters, like .-;?

Upvotes: 5

Views: 5877

Answers (4)

Mark Rhodes
Mark Rhodes

Reputation: 10217

StringUtils uses Java's standard Pattern class under the hood. If you don't want to import Apache's library and want it to run quicker (since it doesn't have to compile the regex each time it's used) you could do:

private static final Pattern NO_ODD_CHARACTERS = Pattern.compile("[^a-zA-Z0-9.\\-;]+");

...

String cleaned = NO_ODD_CHARACTERS.matcher(input).replaceAll("");

Upvotes: 1

PeterK
PeterK

Reputation: 1723

You could negate your expression;

\p{Alnum}

By placing it in a negative character class:

[^\p{Alnum}]

That will match any non-alpha numeric characters, you could then replace those with "". if you wanted to allow additional characters you can just append them to the character class, e.g.:

[^\p{Alnum}\s]

will not match white space characters (\s).

If you where to replace

[^\p{Alnum}.;-]

with "", these characters will also be allowed: ., ; or -.

Upvotes: 1

Philip Helger
Philip Helger

Reputation: 1864

You mean something like StringUtils.replacePattern(input, "[^a-z\.\-]+", ""); - even though I don't exactly whether StringUtils uses a special RegEx syntax.

Upvotes: 0

mk.
mk.

Reputation: 11710

Use the not operator ^:

[^a-zA-Z0-9.\-;]+

This means "match what is not these characters". So:

StringUtils.replacePattern(input, "[^a-zA-Z0-9.\\-;]+", "");

Don't forget to properly escape the characters that need escaping: you need to use two backslashes \\ because your regex is a Java string.

Upvotes: 7

Related Questions