ZeExplorer
ZeExplorer

Reputation: 553

Deal with apostrophe in java regex in replaceALL

Trying the replace only the EXACT & WHOLE OCCURRENCES of pattern using the following code. Apparently you in you'll is being replaced as @@@'ll. But what I want is only you to be replaced.

Please suggest.

import java.util.*;
import java.io.*;
public class Fielreadingtest{


public static void main(String[] args) throws IOException {
    String MyText  = "I knew about you long before I met you. I also know that you’re an awesome person. By the way you’ll be missed. ";
     String newLine = System.getProperty("line.separator");
    System.out.println("Before:" + newLine + MyText);
    String pattern = "\\byou\\b";
    MyText = MyText.replaceAll(pattern, "@@@");
    System.out.println("After:" + newLine +MyText);

}
}

/*
Before:
I knew about you long before I met you. I also know that you’re an awesome person. By the way you’ll be missed. 
After:
I knew about @@@ long before I met @@@. I also know that @@@’re an awesome person. By the way @@@’ll be missed. 

*/

This being said I have an input file which contains a list of words that I want to skip which looks like this: enter image description here

Now as per @Anubhav I have to use (^|\\s)you([\\s.]|$) to replace exactly you but not anything else. Is my best bet to use a tool like notepad++ and pre & post fix all my input words as above or change something in the code itslef. The code I'm using is this:

  for (String pattern : patternsToSkip) {
     line = line.replaceAll(pattern, "");
   }

source: https://www.cloudera.com/content/cloudera-content/cloudera-docs/HadoopTutorial/CDH4/Hadoop-Tutorial/ht_wordcount2_source.html?scroll=topic_7_1

Upvotes: 2

Views: 387

Answers (2)

anubhava
anubhava

Reputation: 785108

You can instead use this regex:

String pattern = "(^|\\s)you([\\s.,;:-]|$)";

This will match "you" only at:

  • start or preceded by a space
  • end or followed by a space OR a some listed punctuation characters

Upvotes: 1

aliteralmind
aliteralmind

Reputation: 20163

You can use a negative lookahead:

\b(you)(?!['’])

Escaped for a Java string:

"\\b(you)(?!['’])"

Your demo input contains a different apostrophe than on my keyboard. I've put both in the negative lookahead.

 import  java.util.regex.Pattern;
 import  java.util.regex.Matcher;

 /**
    <P>{@code java ReplaceYouWholeWordWithAtAtAt}</P>
  **/
 public class ReplaceYouWholeWordWithAtAtAt  {
    public static final void main(String[] ignored)  {

       String sRegex = "\\byou(?!['’])";

       String sToSearch = "I knew about you long before I met you. I also know that you’re an awesome person. By the way you’ll be missed.";
       String sRplcWith = "@@@";

       Matcher m = Pattern.compile(sRegex).matcher(sToSearch);
       StringBuffer sb = new StringBuffer();
       while(m.find())  {
          m.appendReplacement(sb, sRplcWith);
       }
       m.appendTail(sb);

       System.out.println(sb);
    }
 }

Output:

[C:\java_code\]java ReplaceYouWholeWordWithAtAtAt
 I knew about @@@ long before I met @@@. I also know that youÆre an awesome person. By the way youÆll be missed.

Upvotes: 1

Related Questions