kane
kane

Reputation: 6027

Replace whole tokens that may contain regular expression

I want to do a startStr.replaceAll(searchStr, replaceStr) and I have two requirements.

  1. The searchStr must be a whole word, meaning it must have a space, beginning of string or end of string character around it.
    • e.g.
      • startStr = "ON cONfirmation, put ON your hat"
      • searchStr = "ON"
      • replaceStr = ""
      • expected = " cONfirmation, put your hat"
  2. The searchStr may contain a regex pattern
    • e.g.
      • startStr = "remove this * thing"
      • searchStr = "*"
      • replaceStr = ""
      • expected = "remove this thing"

For requirement 1, I've found that this works:

startStr.replaceAll("\\b"+searchStr+"\\b",replaceStr)

For requirement 2, I've found that this works:

startStr.replaceAll(Pattern.quote(searchStr), replaceStr)

But I can't get them to work together:

startStr.replaceAll("\\b"+Pattern.quote(searchStr)+"\\b", replaceStr)

Here is the simple test case that's failing

startStr = "remove this * thing but not this*"

searchStr = "*"

replaceStr = ""

expected = "remove this thing but not this*"

actual = "remove this * thing but not this*"

What am I missing?

Thanks in advance

Upvotes: 1

Views: 79

Answers (4)

Rijo Joseph
Rijo Joseph

Reputation: 1405

You can use (^| )\*( |$) instead of using \\b

Try this startStr.replaceAll("(^| )youSearchString( |$)", replaceStr);

Upvotes: 0

sdanzig
sdanzig

Reputation: 4500

First off, the \b, or word boundary, is not going to work for you with the asterisks. The reason is that \b only detects boundaries of word characters. A regex parser won't acknowledge * as a word character, so a wildcard-endowed word that begins or ends with a regex won't be surrounded by valid word boundaries.

Reference pages: http://www.regular-expressions.info/wordboundaries.html http://docs.oracle.com/javase/tutorial/essential/regex/bounds.html

An option you might like is to supply wildcard permutations in a regex:

(?<=\s|^)(ON|\*N|O\*|\*)(?=\s|$)

Here's a Java example:

import java.util.regex.Pattern;
import java.util.regex.Matcher;

class RegExTest
{
  public static void main(String[] args){
    String sourcestring = "ON cONfirmation, put * your hat";
    sourcestring = sourcestring.replaceAll("(?<=\\s|^)(ON|\\*N|O\\*|\\*)(?=\\s|$)","").replaceAll("  "," ").trim();
    System.out.println("sourcestring=["+sourcestring+"]");
  }
}

You can write a little function to generate the wildcard permutations automatically. I admit I cheated a little with the spaces, but I don't think that was a requirement anyway.

Play with it online here: http://ideone.com/7uGfIS

Upvotes: 1

traybold
traybold

Reputation: 444

The pattern "\\b" matches a word boundary, with a word character on one side and a non-word character on the other. * is not a word character, so \\b\\*\\b won't work. Look-behind and look-ahead match but do not consume patterns. You can specify that the beginning of the string or whitespace must come before your pattern and that whitespace or the end of the string must follow:

startStr.replaceAll("(?<=^|\\s)"+Pattern.quote(searchStr)+"(?=\\s|$)", replaceStr)

Upvotes: 1

newuser
newuser

Reputation: 8466

Try this,

For removing "ON"

        StringBuilder stringBuilder = new StringBuilder();
        String[] splittedValue = startStr.split(" ");
        for (String value : splittedValue)
        {
            if (!value.equalsIgnoreCase("ON"))
            {
                stringBuilder.append(value);
                stringBuilder.append(" ");
            }
        }
        System.out.println(stringBuilder.toString().trim());

For removing "*"

    String startStr1 = "remove this * thing";
    System.out.println(startStr1.replaceAll("\\*[\\s]", ""));

Upvotes: 0

Related Questions