Ammu
Ammu

Reputation: 5137

Double quotes in Regular expression

How can I get a string inside double quotes using regular expression?

I have the following string:

<img src="http://yahoo.com/img1.jpg" alt="">

I want to get the string http://yahoo.com/img1.jpg alt="" outside. How can I do this using regular expression?

Upvotes: 11

Views: 28185

Answers (3)

Kaj
Kaj

Reputation: 10959

I don't know why you want the alt tag as well, but this regexp does what you want: Group 1 is the url and group 2 is the alt tag. I would possibly modify the regexp a bit if there can be several spaces between img and src, and if there can be spaces around '='

Pattern p = Pattern.compile("<img src=\"([^\"]*)\" (alt=\"[^\"]*\")>");
Matcher m = 
    p.matcher("<img src=\"http://yahoo.com/img1.jpg\" alt=\"\"> " + 
    "<img src=\"http://yahoo.com/img2.jpg\" alt=\"\">");

while (m.find()) {
    System.out.println(m.group(1) + "  " + m.group(2));
}

Output:

http://yahoo.com/img1.jpg  alt=""
http://yahoo.com/img2.jpg  alt=""

Upvotes: 10

Andreas Dolk
Andreas Dolk

Reputation: 114847

This should do the job:

String url = "";
Pattern p = Pattern.compile("(?<=src=\")[^\"]*(?=\")");
Matcher m = p.matcher("<img src=\"http://yahoo.com/img1.jpg\" alt=\"\">");
if (m.find())
    url = m.group());

The parser will take every char except " after src=" and before "

Upvotes: 2

MarcoS
MarcoS

Reputation: 13574

You can do it like this:

Pattern p = Pattern.compile("<img src=\"(.*?)\".*?>");
Matcher m = p.matcher("<img src=\"http://yahoo.com/img1.jpg\" alt=\"\">");
if (m.find())
  System.out.println(m.group(1));

However, if you're parsing HTML consider using some library: regex are not a good idea to parse HTML. I had good experiences with jsoup: here's an example:

String fragment = "<img src=\"http://yahoo.com/img1.jpg\" alt=\"\">";
Document doc = Jsoup.parseBodyFragment(fragment);
Element img = doc.select("img").first();
String src = img.attr("src");
System.out.println(src);

Upvotes: 8

Related Questions