Brianzaska
Brianzaska

Reputation: 51

How to find and replace certain string pattern in java

As a result of text extrapolation from PDF's I need to fix some bugs. I need to replace every form of strings like these:

String example="the sun was shin- ing  and the sky bl- ue";

in the form:

String fixxed="the sun was shining  and the sky blue";

I'm not expert in regular expressions, I tried to do so but it's wrong.

String pattern="([\\w])+([\\-])+([\\s])";
String fixxed = text.replaceAll(pattern, "$1");

An important specification, I only have to replace the substring if the character before '-' is a letter (not a space and not a number).

Upvotes: 2

Views: 990

Answers (5)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

You can use the following solution with any language, even those using diacritics:

(\p{L}\p{M}*+)-\h(?=\p{L})

Or, with \h+, if there can be more than one space between the letter and -+letter:

(\p{L}\p{M}*+)-\h+(?=\p{L})

Replace \h with \s if there can be a line break between the parts of a torn word.

See the regex demo, replace the matches with $1 replacement backreference that will put the contents of Group 1.

  • (\p{L}\p{M}*+) - Group 1: any Unicode letter followed with 0 or more diacritics
  • - - a hyphen
  • \h+ / \s+ - one or more horizontal / any whitespace chars
  • (?=\p{L}) - a positive lookahead that requires the next char to be any Unicode letter.

See the Java code:

String text = "the sun was shin- ing  and the sky bl- ue";
System.out.println(text.replaceAll("(\\p{L}\\p{M}*+)-\\s+(?=\\p{L})", "$1"));
// => the sun was shining  and the sky blue

Upvotes: 0

The fourth bird
The fourth bird

Reputation: 163217

To only replace the substring if the character before - is a letter (using \w to match a word character), you could use a lookarounds to assert a word character on the left and on the right.

This will replace bl- ue to blue and also replace bl- u- es to blues

(?<=\w)-\s(?=\w)

Regex demo | Java demo

For example

String example = "the sun was shin- ing and the sky bl- ue or bl- u- es";
System.out.println(example.replaceAll("(?<=\\w)-\\s(?=\\w)", ""));

Output

the sun was shining and the sky blue or blues

If you don't want to change:

bl-
ue

to

blue

You could use \h to match a horizontal whitespace char instead of using \s, which could also match a newline.

(?<=\w)-\h(?=\w)

Regex demo

Upvotes: 0

libanbn
libanbn

Reputation: 270

You can fetch the letters before, the letter after and combine them:

public static void main(String[] args) {
    String example = "the sun was shin- ing  and the sky bl- ue a - a 1-2 1 - 2";
    String pattern = "(\\w+)-\\s(\\w)";

    String newExample = example.replaceAll(pattern, "$1$2");
    System.out.println(newExample);
}


Output:

the sun was shining  and the sky blue a - a 1-2 1 - 2

Upvotes: 2

explorer
explorer

Reputation: 1104

You can use replaceAll() method of String to replace a specific set of characters.

According to Oracle docs,

replaceAll(String regex, String replacement)

Replaces each substring of this string that matches the given regular expression with the given replacement.

So, for your case you can do it like,

 String example = "the sun was shin- ing  and the sky bl- ue";
 System.out.println(example.replaceAll("- ",""));

or

String example = "the sun was shin- ing  and the sky bl- ue";
System.out.println(example.replaceAll("\\-\\s+",""));

Output for both cases will be like below,

 the sun was shining  and the sky blue

Upvotes: 0

Arvind Kumar Avinash
Arvind Kumar Avinash

Reputation: 79015

Do it as follows:

public class Main {
    public static void main(String[] args) {
        String example = "the sun was shin- ing  and the sky bl- ue";
        example = example.replaceAll("\\-\\s+", "");
        System.out.println(example);
    }
}

Output:

the sun was shining  and the sky blue

Upvotes: 3

Related Questions