Harshit
Harshit

Reputation: 1217

using regex to slash out initials in pattern

I am trying to slash out pattern as specified using regex , but in replacement also replaces wanted character . specifying boundary does not help in this case .

 String name = "Dr.Dre" ;  
     Pattern p = Pattern.compile("(Mr.|MR.|Dr.|mr.|DR.|dr.|ms.|Ms.|MS.|Miss.|Mrs.|mrs.|miss.|MR|mr|Mr|Dr|DR|dr|ms|Ms|MS|miss|Miss|Mrs|mrs)"+"\\b");
     Matcher m = p.matcher(name);
     StringBuffer sb = new StringBuffer();
     String namef = m.replaceAll("");    
     System.out.println(namef);

Input : Dr.Dre or Dr Dre or Dr. Dre

> output(expected) : Dre or Dre or Dre

Edit:

Thanks for help , but there is little regex issue I am facing: Program:

String name = "Dr. Dre" ;  
Pattern p = Pattern.compile("(Mr\\.|MR\\.|Dr\\.|mr\\.|DR\\.|dr\\.|ms\\.|Ms\\.|MS\\.|Miss\\.|Mrs\\.|mrs\\.|miss\\.|MR|mr|Mr|Dr|DR|dr|ms|Ms|MS|miss|Miss|Mrs|mrs)"+"\\b");
Matcher m = p.matcher(name);
String namef = m.replaceAll(""); 
System.out.println(namef);

For above program I receive output as: . Dre while the desired output is : Dre

Upvotes: 1

Views: 812

Answers (2)

Jon Skeet
Jon Skeet

Reputation: 1500245

Dot in a regular expression means "any character". You need to escape it with a backslash, which in turn needs to be escaped in a string literal:

Pattern p = Pattern.compile("Mr\\.|MR\\.|Dr\\.|mr\\.|DR\\.|dr\\.|ms\\."); // etc

Note that you'll end up with a double space after removing "Dr." from "or Dr. Dre" though...

EDIT: For some reason (I haven't worked out why), a space after a dot doesn't count as a word boundary. If you change your pattern to use \\s instead of \\b, so replace a single whitespace character, it works for "Dr. Dre" - but as noted in comments, it then fails for "Dr.Dre". You could either remove the word boundary entirely and add a space to the later parts of the pattern ("DR |Dr |" etc) or use (\\s|\\b) which works for the cases I tried it on, but may well have other undesirable side-effects.

Upvotes: 7

tzhechev
tzhechev

Reputation: 137

The question is a bit unclear (you aren't providing the problematic results), but my guess is that the problem lies in using the period character. The period has a meaning in regex - it matches ANY character, so "Dr." will actually match *Dr.D*re. You have to escape it like so "Dr." or in your code specifically, to escape the escape slash, like this: "Dr\."

Hope that helps!

Upvotes: 2

Related Questions