Reputation: 2039
I have string like this:
−+-~*/@$^#¨%={}[häagen-dazs;:] a (le & co') jsou "výborné" <značky>?!.
And I want to end up with this:
häagen-dazs a le & co jsou výborné značky.
In comparison to How to filter string for unwanted characters using regex? I want to keep accent (diacritics) in the string.
I use following replaceAll:
str.replaceAll("[¨%=;\\:\\(\\)\\$\\[\\]\\{\\}\\<\\>\\+\\*\\−\\@\\#\\~\\?\\!\\^\\'\\\"\\|\\/]", "");
Upvotes: 2
Views: 3674
Reputation: 32145
You can loop through all the input String
characters and test each one if it matches your wanted Regex keep it, use this Regex [a-zA-Z& \\-_\\.ýčéèêàâùû]
to test upon each character individually.
This is the code you need:
String input = "−+-~*/@$^#¨%={}[häagen-dazs;:] a (le & co') jsou výborné <značky>?!";
StringBuffer sb = new StringBuffer();
for(char c : input.toCharArray()){
if((Character.toString(c).toLowerCase()).matches("[a-zA-Z& \\-_\\.ýčéèêàâùû]")){
sb.append(c);
}
}
System.out.println(sb.toString());
Demo:
And here's a working Demo that uses this code and gives the following output:
-hagen-dazs. a le & co jsou výborné značky
Note:
input.toCharArray()
to get an array of char
s and loop over it.(Character.toString(c).toLowerCase()).matches("[a-zA-Z& \\-_\\.ýčéèêàâùû]")
to test if the iterated char
matches the allowed characters Regex.StringBuffer
to construct a new String
with only the
allowed characters.Upvotes: 1
Reputation: 626893
You need to use
String res = input.replaceAll("(?U)[^\\p{L}\\p{N}\\s&.-]+", "");
Note that the regex matches any character other than (because [^...]
is a negated character class), one or more times (due to the +
quantifier):
\p{L}
- any Unicode letter\p{N}
- any Unicode digit\s
- any Unicode whitespace (\s
becomes Unicode aware due to the (?U)
inline Pattern.UNICODE_CHARACTER_CLASS
modifier version) &
- a literal &
.
- a literal .
-
- a literal hyphen (as it is placed at the end of the character classimport java.util.*;
import java.lang.*;
class Rextester
{
public static void main(String args[])
{
String input = "−+-~*/@$^#¨%={}[häagen-dazs;:] a (le & co') jsou výborné <značky>?!";
input = input.replaceAll("(?U)[^\\p{L}\\p{N}\\s&.-]+", "");
System.out.println(input);
}
}
Output: -häagen-dazs a le & co jsou výborné značky
Upvotes: 1
Reputation: 472
Try this
str.replaceAll("[\\\/\.\:\%\!\[\]\(\)\{\}\?\^\*\+\"\'#@$;¨=&<>-~−]", "");
Your regex had something wrong with sintax, i suggest that you build your regex step by step in order to find out immediately if there's a mistake.
Try using this site for testing regex in real time, it's very good
Upvotes: 0