nafas
nafas

Reputation: 5423

how to replace multiple matched Regex

I have a set of regex replacements that are needed to be applied to a set of String,

For example:

  1. all multiple spaces with single space ("\s{2,}" --> " ")
  2. all . followed by a char with . followed by space followed by the char (\.([a-zA-Z]-->". $1")

So I will have something like this:

String s="hello     .how are you?";
s=s.replaceAll("\\s{2,}"," ");
s=s.replaceAll("\\.([a-zA-Z])",". $1");
....

it works , however imagine I'm trying to replace 100+ such expressions on a long String. needless to say how slow this can be.

so my question is if there is a more efficient way to generalize these replacements with a single replaceAll (or something similar e.g. Pattern/Matcher)

I have followed Java Replacing multiple different...,

but the problem is that my regex(s) are not simple Strings.

Upvotes: 5

Views: 8184

Answers (2)

Aseem Bansal
Aseem Bansal

Reputation: 6962

Look at Replace multiple substrings at Once and modify it.

Use a Map<Integer, Function<Matcher, String>>.

  • group numbers as Integer keys
  • Lambdas as values

Modify the loop to check which group was matched. Then use that group number for getting the replacement lambda.

Pseudo code

Map<Integer, Function<Matcher, String>> replacements = new HashMap<>() {{
    put(1, matcher -> "");
    put(2, matcher -> " " + matcher.group(2));
}};

String input = "lorem substr1 ipsum substr2 dolor substr3 amet";

// create the pattern joining the keys with '|'. Need to add groups for referencing later
String regexp = "(\\s{2,})|(\\.(?:[a-zA-Z]))";

StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile(regexp);
Matcher m = p.matcher(input);

while (m.find()) {
    //TODO change to find which groupNum matched
    m.appendReplacement(sb, replacements.get(m.group(groupNum)));
}
m.appendTail(sb);


System.out.println(sb.toString());   // lorem repl1 ipsum repl2 dolor repl3 amet

Upvotes: 1

anubhava
anubhava

Reputation: 785156

You have these 2 replaceAll calls:

s = s.replaceAll("\\s{2,}"," ");
s = s.replaceAll("\\.([a-zA-Z])",". $1");

You can combine them into a single replaceAll like this:

s = s.replaceAll("\\s{2,}|(\\.)(?=[a-zA-Z])", "$1 ");

RegEx Demo

Upvotes: 4

Related Questions