Reputation: 5137
How can I get a string inside double quotes using regular expression?
I have the following string:
<img src="http://yahoo.com/img1.jpg" alt="">
I want to get the string http://yahoo.com/img1.jpg alt=""
outside.
How can I do this using regular expression?
Upvotes: 11
Views: 28185
Reputation: 10959
I don't know why you want the alt tag as well, but this regexp does what you want: Group 1 is the url and group 2 is the alt tag. I would possibly modify the regexp a bit if there can be several spaces between img and src, and if there can be spaces around '='
Pattern p = Pattern.compile("<img src=\"([^\"]*)\" (alt=\"[^\"]*\")>");
Matcher m =
p.matcher("<img src=\"http://yahoo.com/img1.jpg\" alt=\"\"> " +
"<img src=\"http://yahoo.com/img2.jpg\" alt=\"\">");
while (m.find()) {
System.out.println(m.group(1) + " " + m.group(2));
}
Output:
http://yahoo.com/img1.jpg alt=""
http://yahoo.com/img2.jpg alt=""
Upvotes: 10
Reputation: 114847
This should do the job:
String url = "";
Pattern p = Pattern.compile("(?<=src=\")[^\"]*(?=\")");
Matcher m = p.matcher("<img src=\"http://yahoo.com/img1.jpg\" alt=\"\">");
if (m.find())
url = m.group());
The parser will take every char except "
after src="
and before "
Upvotes: 2
Reputation: 13574
You can do it like this:
Pattern p = Pattern.compile("<img src=\"(.*?)\".*?>");
Matcher m = p.matcher("<img src=\"http://yahoo.com/img1.jpg\" alt=\"\">");
if (m.find())
System.out.println(m.group(1));
However, if you're parsing HTML consider using some library: regex are not a good idea to parse HTML. I had good experiences with jsoup: here's an example:
String fragment = "<img src=\"http://yahoo.com/img1.jpg\" alt=\"\">";
Document doc = Jsoup.parseBodyFragment(fragment);
Element img = doc.select("img").first();
String src = img.attr("src");
System.out.println(src);
Upvotes: 8