Reputation: 51
As a result of text extrapolation from PDF's I need to fix some bugs. I need to replace every form of strings like these:
String example="the sun was shin- ing and the sky bl- ue";
in the form:
String fixxed="the sun was shining and the sky blue";
I'm not expert in regular expressions, I tried to do so but it's wrong.
String pattern="([\\w])+([\\-])+([\\s])";
String fixxed = text.replaceAll(pattern, "$1");
An important specification, I only have to replace the substring if the character before '-' is a letter (not a space and not a number).
Upvotes: 2
Views: 990
Reputation: 626738
You can use the following solution with any language, even those using diacritics:
(\p{L}\p{M}*+)-\h(?=\p{L})
Or, with \h+
, if there can be more than one space between the letter and -
+letter:
(\p{L}\p{M}*+)-\h+(?=\p{L})
Replace \h
with \s
if there can be a line break between the parts of a torn word.
See the regex demo, replace the matches with $1
replacement backreference that will put the contents of Group 1.
(\p{L}\p{M}*+)
- Group 1: any Unicode letter followed with 0 or more diacritics-
- a hyphen\h+
/ \s+
- one or more horizontal / any whitespace chars(?=\p{L})
- a positive lookahead that requires the next char to be any Unicode letter.See the Java code:
String text = "the sun was shin- ing and the sky bl- ue";
System.out.println(text.replaceAll("(\\p{L}\\p{M}*+)-\\s+(?=\\p{L})", "$1"));
// => the sun was shining and the sky blue
Upvotes: 0
Reputation: 163217
To only replace the substring if the character before -
is a letter (using \w
to match a word character), you could use a lookarounds to assert a word character on the left and on the right.
This will replace bl- ue
to blue
and also replace bl- u- es
to blues
(?<=\w)-\s(?=\w)
For example
String example = "the sun was shin- ing and the sky bl- ue or bl- u- es";
System.out.println(example.replaceAll("(?<=\\w)-\\s(?=\\w)", ""));
Output
the sun was shining and the sky blue or blues
If you don't want to change:
bl-
ue
to
blue
You could use \h
to match a horizontal whitespace char instead of using \s
, which could also match a newline.
(?<=\w)-\h(?=\w)
Upvotes: 0
Reputation: 270
You can fetch the letters before, the letter after and combine them:
public static void main(String[] args) {
String example = "the sun was shin- ing and the sky bl- ue a - a 1-2 1 - 2";
String pattern = "(\\w+)-\\s(\\w)";
String newExample = example.replaceAll(pattern, "$1$2");
System.out.println(newExample);
}
the sun was shining and the sky blue a - a 1-2 1 - 2
Upvotes: 2
Reputation: 1104
You can use replaceAll() method of String to replace a specific set of characters.
According to Oracle docs,
replaceAll(String regex, String replacement)
Replaces each substring of this string that matches the given regular expression with the given replacement.
So, for your case you can do it like,
String example = "the sun was shin- ing and the sky bl- ue";
System.out.println(example.replaceAll("- ",""));
or
String example = "the sun was shin- ing and the sky bl- ue";
System.out.println(example.replaceAll("\\-\\s+",""));
Output for both cases will be like below,
the sun was shining and the sky blue
Upvotes: 0
Reputation: 79015
Do it as follows:
public class Main {
public static void main(String[] args) {
String example = "the sun was shin- ing and the sky bl- ue";
example = example.replaceAll("\\-\\s+", "");
System.out.println(example);
}
}
Output:
the sun was shining and the sky blue
Upvotes: 3