Reputation: 1102
I am trying to extract data between a href tags in a Java string. I can acheive this with replace all and substring and with using indexOf etc.
I would like to know how can I get data using regex.
So basically i am trying to extract data and store in a string or in a list.
String data ="12345";
String sampleStr ="";
for(int i=0; i<10; i++) {
data+=i;
sampleStr += "<a href=\"javascript:yyy_getDetail(\'"+data+"\')\">"+data+"</a>"+", ";
}
System.out.println(sampleStr);
String temp = sampleStr.substring(sampleStr.indexOf("\">")+2);
Any suggestion in regard will be appreciated. What should be regex, so i only extract data.
Upvotes: 6
Views: 2028
Reputation: 6609
Please, use a HTML/XML parser instead. Your life would be much easier.
HTML is usually very inconsistent and you can't be sure that it will turn out the way you want it.
There's actually a famous answer regarding this, at RegEx match open tags except XHTML self-contained tags
You should take a look at Best XML parser for Java for your options if you choose to use a HTML/XML parser :)
Upvotes: 1
Reputation: 13033
Here is an example for your needs. Note, that the full match will contain the string with anchor tags and your searched content is in the group 1
.
String data ="12345";
String sampleStr ="";
for(int i=0; i<10; i++)
{
data+=i;
sampleStr += "<a href=\"javascript:yyy_getDetail(\'"+data+"\')\">"+data+"</a>"+", ";
}
Pattern pattern = Pattern.compile("<a[^>]*>(.*?)</a>");
Matcher matcher = pattern.matcher(sampleStr );
while (matcher.find())
{
System.out.println("Result "+ matcher.group(1));
}
Upvotes: 3