bharathi
bharathi

Reputation: 6271

How to remove the @ in a string using Pattern in java

I need to remove a part of the string which starts with @.

My sample code works for one string and fails for another.

Failed one: Not able to remove @news4buffalo:

String regex = "\\@\\w+ || @\\w*";
String rawContent =  "RT @news4buffalo: Police say a shooter fired into a crowd    yesterday on the Oakmont overpass, striking and killing a 14-year-old. More: http…";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(rawContent);
if (matcher.find()) {
    rawContent = rawContent.replaceAll(regex, "");
} 

Success one:

String regex = "\\@\\w+ || @\\w*";
String rawContent =  "@ZaslowShow couldn't agree more. Good crowd last night. #LetsGoFish";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(rawContent);
if (matcher.find()) {
    rawContent = rawContent.replaceAll(regex, "");
} 

Output:

couldn't agree more. Good crowd last night. #LetsGoFish

Upvotes: 0

Views: 63

Answers (4)

Pshemo
Pshemo

Reputation: 124215

  1. You don't need to escape @ so don't add \ before it like "\\@" (it confuses people).

  2. Don't use matcher to check if string contains part which should be replaced and than use replaceAll because you will have to iterate second time. Just use replaceAll at start, and if it doesn't have anything to replace, it will leave string unchanged. BTW. use replaceAll from Matcher instance to avoid recompiling Pattern.

  3. Regex in form foo||bar doesn't seem right. Regex uses only one pipe | to represent OR so such regex represents foo OR emptyString OR bar. Since empty String is kind of special (every string contains empty string at start, and at end, and even in between characters) it can cause some problems like "foo".replaceAll("|foo", "x") returns xfxoxox, instead of for instance "xxx" because consumption of empty string before f prevented it from being used as potential first character of foo :/

Anyway it seems that you would like to accept any @xxxx words so consider maybe something like "@\\w+" if you want to make sure that there will be at least one character after @.

You can also add condition that @ must be first character of word (in case you wouldn't want to remove part after @ from e-mail addresses). To do this just use look-behind like (?<=\\s|^)@ which will check that before @ exist some whitespace, or it is placed at start of the string.

You can also remove space after word you wanted to remove (it there is any).

So you can try with

String regex = "(?<=\\s|^)@\\w*\\s?";

which for data like

RT @news4buffalo: Police say a shooter fired into a crowd    yesterday on the Oakmont overpass, striking and killing a 14-year-old. More: http…

will return

RT : Police say a shooter fired into a crowd    yesterday on the Oakmont overpass, striking and killing a 14-year-old. More: http…

But if you would also like to remove other characters beside alphabetic or numeric ones from \\w like : you can simply use \\S which represents non-whitespace-characters, so your regex can look like

String regex = "(?<=\\s|^)@\\S*\\s?";

Upvotes: 0

Braj
Braj

Reputation: 46841

You can try in this way as well.

String s = "@ZaslowShow couldn't agree more. Good crowd last night. #LetsGoFish";
System.out.println(s.replaceAll("@[^\\s]*\\s+", ""));
// Look till space is not found----^^^^  ^^^^---------remove extra spaces as well

Upvotes: 1

M A
M A

Reputation: 72844

The regex is only considering word characters whereas your input String contains a colon :. You can solve this by replacing \\w with \\S (any non-whitespace character) in your regex. Also there is no need for two patterns.

String regex = "@\\S*";

Upvotes: 0

anubhava
anubhava

Reputation: 784998

From your question it looks like this regex can work for you:

rawContent = rawContent.replaceAll("@\\S*", "");

Upvotes: 1

Related Questions