Reputation: 2464
How can I remove email address from a string? And all other digits and special characters?
Sample String can be
"Hello world my # is 123 mail me @ [email protected]"
Out put string should be
"Hello world my is mail me"
I googled this and found that I can use following regular expressions
"[^A-Za-z0-9\\.\\@_\\-~#]+"
but that example was more to check valid email ids not removing it. I am new to java!
Upvotes: 0
Views: 5527
Reputation: 570615
As pointed out by others, you could use regular expressions to clean up your String and replace unwanted part by an empty string ""
. To do so, have a look at the replaceAll(String regex, String replacement)
method of the String
class and at the Pattern
class for the syntax of regular expressions in Java.
Below, some code demonstrating one way to clean the provided sample String (maybe not the most elegant though):
String input = "Hello world my # is 123 mail me @ [email protected]";
String EMAIL_PATTERN = "([^.@\\s]+)(\\.[^.@\\s]+)*@([^.@\\s]+\\.)+([^.@\\s]+)";
String output = input.replaceAll(EMAIL_PATTERN, "") // Replace emails
// by an empty string
.replaceAll("\\p{Punct}", "") // Replace all punctuation. One of
// !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
.replaceAll("\\d", "") // Replace any digit by an empty string
.replaceAll("\\p{Blank}{2,}+", " "); // Replace any Blank (a space or
// a tab) repeated more than once
// by a single space.
System.out.println(output);
Running this code produces the following output:
Hello world my is mail me
If you need to remove more garbage (or less, like punctuation), well, you've got the principle. Adapt it to suit your needs.
Upvotes: 5
Reputation: 14265
From your example, it looks like it's not just email addresses you're interested in removing, it's all non-alpha characters, so this is trivial:
str = str.replaceAll("([^.@\\s]+)(\\.[^.@\\s]+)*@([^.@\\s]+\\.)+([^.@\\s]+)", "")
.replaceAll("[^\\p{Alpha} ]", "")
.replaceAll("[ ]{2,}+", " ");
See the Pattern
JavaDocs for information about what the special character class \p{Alpha}
means...
Upvotes: 0
Reputation: 1109695
You can use String#replaceAll()
for this. Just let it replace any regex matches by an empty string ""
. The regex you mentioned is however not very robust. A better one is this (copied from here and slightly changed for use in plain vanilla text):
string = string.replaceAll("([^.@\\s]+)(\\.[^.@\\s]+)*@([^.@\\s]+\\.)+([^.@\\s]+)", "");
Hope this helps.
Upvotes: 2
Reputation: 272427
Check out the Java regular expression Pattern class and its uses. There's a useful tutorial here which includes replacement methods.
An aside: this is a particularly robust regexp to use for RFC822-compliant email addresses :-) You should be able to come up with something more concise for your needs! There's a discussion of email regexps and trade-offs here.
Upvotes: 1