Reputation: 13
well i got a nice solution here but the regex split the string into "" string and 2 other splits i needed.
String Result = "<ahref=https://blabla.com/Securities_regulation_in_the_United_States>Securities regulation in the United States</a> - Securities regulation in the United States is the field of U.S. law that covers transactions and other dealings with securities.";
String [] Arr = Result.split("<[^>]*>");
for (String elem : Arr) {
System.out.printf(elem);
}
the result is:
Arr[0]= ""
Arr[1]= Securities regulation in the United States
Arr[2]= Securities regulation in the United States is the field of U.S. law that covers transactions and other dealings with securities.
the Arr[1]
and Arr[2]
splits are fine I just cant get rid of the Arr[0]
.
Upvotes: 1
Views: 60
Reputation: 30985
You can use an opposite regex to capture what you want by using a regex like this:
(?s)(?:^|>)(.*?)(?:<|$)
Code:
String line = "ahref=https://blabla.com/Securities_regulation_in_the_United_States>Securities regulation in the United States</a> - Securities regulation in the United States is the field of U.S. law that covers transactions and other dealings with securities.";
Pattern pattern = Pattern.compile("(?s)(?:^|>)(.*?)(?:<|$)");
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
System.out.println("group 1: " + matcher.group(1));
}
Upvotes: 2
Reputation: 124215
You can't avoid that empty string if you are using only split
, especially since your regex is not zero-length.
You could try removing that first match placed at start of your input, and then split in rest of matches like
String[] Arr = Result.replaceFirst("^<[^>]+>","").split("<[^>]+>")
But generally you should avoid using regex with HTML\XML. Try using parser instead like Jsoup.
Upvotes: 1