Reputation: 6271
I need to remove a part of the string which starts with @.
My sample code works for one string and fails for another.
Failed one: Not able to remove @news4buffalo:
String regex = "\\@\\w+ || @\\w*";
String rawContent = "RT @news4buffalo: Police say a shooter fired into a crowd yesterday on the Oakmont overpass, striking and killing a 14-year-old. More: http…";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(rawContent);
if (matcher.find()) {
rawContent = rawContent.replaceAll(regex, "");
}
Success one:
String regex = "\\@\\w+ || @\\w*";
String rawContent = "@ZaslowShow couldn't agree more. Good crowd last night. #LetsGoFish";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(rawContent);
if (matcher.find()) {
rawContent = rawContent.replaceAll(regex, "");
}
Output:
couldn't agree more. Good crowd last night. #LetsGoFish
Upvotes: 0
Views: 63
Reputation: 124215
You don't need to escape @
so don't add \
before it like "\\@"
(it confuses people).
Don't use matcher to check if string contains part which should be replaced and than use replaceAll
because you will have to iterate second time. Just use replaceAll
at start, and if it doesn't have anything to replace, it will leave string unchanged. BTW. use replaceAll
from Matcher instance to avoid recompiling Pattern.
Regex in form foo||bar
doesn't seem right. Regex uses only one pipe |
to represent OR so such regex represents foo
OR emptyString
OR bar
. Since empty String is kind of special (every string contains empty string at start, and at end, and even in between characters) it can cause some problems like "foo".replaceAll("|foo", "x")
returns xfxoxox
, instead of for instance "xxx"
because consumption of empty string before f
prevented it from being used as potential first character of foo
:/
Anyway it seems that you would like to accept any @xxxx
words so consider maybe something like "@\\w+"
if you want to make sure that there will be at least one character after @
.
You can also add condition that @
must be first character of word (in case you wouldn't want to remove part after @
from e-mail addresses). To do this just use look-behind like (?<=\\s|^)@
which will check that before @
exist some whitespace, or it is placed at start of the string.
You can also remove space after word you wanted to remove (it there is any).
So you can try with
String regex = "(?<=\\s|^)@\\w*\\s?";
which for data like
RT @news4buffalo: Police say a shooter fired into a crowd yesterday on the Oakmont overpass, striking and killing a 14-year-old. More: http…
will return
RT : Police say a shooter fired into a crowd yesterday on the Oakmont overpass, striking and killing a 14-year-old. More: http…
But if you would also like to remove other characters beside alphabetic or numeric ones from \\w
like :
you can simply use \\S
which represents non-whitespace-characters, so your regex can look like
String regex = "(?<=\\s|^)@\\S*\\s?";
Upvotes: 0
Reputation: 46841
You can try in this way as well.
String s = "@ZaslowShow couldn't agree more. Good crowd last night. #LetsGoFish";
System.out.println(s.replaceAll("@[^\\s]*\\s+", ""));
// Look till space is not found----^^^^ ^^^^---------remove extra spaces as well
Upvotes: 1
Reputation: 72844
The regex is only considering word characters whereas your input String
contains a colon :
. You can solve this by replacing \\w
with \\S
(any non-whitespace character) in your regex. Also there is no need for two patterns.
String regex = "@\\S*";
Upvotes: 0
Reputation: 784998
From your question it looks like this regex can work for you:
rawContent = rawContent.replaceAll("@\\S*", "");
Upvotes: 1