Reputation: 43
I'm trying to split a string without removing the matched string, I was kind of successful as I found that this could be done using (?<=-)|(?=-)
, but now if I implement it to extract a link,
using this regex expression:
((?<=(http:\\/\\/\\S+))|(?=(http:\\/\\/\\S+)))
I receive a weird outup.
In fact, splitting this input:
A wonderful serenity has taken possession of http://www.google.com my entire soul,\n like these sweet mornings of spring which I enjoy with my whole heart.
gives me this set of strings:
["A wonderful serenity has taken possession of ", "http://w", "w", "w", ".", "g", "o", "o", "g", "g", "l", "e", ".", "c", "o", "m", "my entire soul,\n like these sweet mornings of spring which I enjoy with my whole heart."]
.
EDIT: The successful output should be:
["A wonderful serenity has taken possession of ", "http://www.google.com", "my entire soul,\n like these sweet mornings of spring which I enjoy with my whole heart."]
Upvotes: 0
Views: 93
Reputation: 520898
One viable option here would be to use a formal regex iterator, and search for the following pattern:
\\bhttps?://\\S+\\b|.*?(?=https?://|$)
This pattern will first try to fish out a URL, if it can find, otherwise it will capture all content up, but including, either the next URL or the end of the input. Here is a sample code:
String input = "A wonderful serenity has taken possession of http://www.google.com my entire soul,\n like these sweet mornings of spring which I enjoy with my whole heart.";
String pattern = "\\bhttps?://\\S+\\b|.*?(?=https?://|$)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(input);
List<String> matches = new ArrayList<>();
while (m.find()) {
matches.add(m.group());
}
System.out.println(matches);
This prints:
[A wonderful serenity has taken possession of ,
http://www.google.com,
like these sweet mornings of spring which I enjoy with my whole heart., ]
Upvotes: 2