Reputation: 1213
I'm trying to get the thumbnails URL from a web-site using Jsoup HTML Parser i need to extract all the URL's that ends with 60x60.jpg(or png) (all thumbnails URL's ends with this URL)
The problem is that i get it to work in an ordinary Java Project, but in Android it doesn't work. (regex problem)
This code works in Java Project:
List<String> urls = new ArrayList<String>();
Document doc = Jsoup.connect("http://example.com").get();
Elements pngs = doc.select("img[src~=(60x60).(png|jpg)]");
for (Element img : pngs) {
String url = img.absUrl("src");
{
if (!urls.contains(url)) {
urls.add(url);
}
}
}
and then print the urls array.. it works in Java, not in Android project.
In Android the only regex that works is only this
Elements pngs = doc.select("img[src$=.jpg]");
it works ok on Android.. though i don't need all the links ending with .jpg
I tried using
Elements pngs = doc.select("img[src~=(60x60)\\.(png|jpg)]");
still not good, even with one slash before .(png|jpg)
so is the problem in Regex? it works different in Android or what? it can't be the parser problem since it works on a normal Java Project..
Upvotes: 2
Views: 1439
Reputation: 10522
It looks like there's a difference between the Java regex engine and Android's Darvik engine.
I would simplify by using the comma
selector syntax, which applies an or
to multiple selectors.
E.g.
Document doc = Jsoup.parse("<img src='foo-60x60.png'> <img src='bar-60x60.jpg'>");
Elements images = doc.select("img[src$=60x60.png], img[src$=60x60.jpg]");
System.out.println(images);
Gives:
<img src="foo-60x60.png" />
<img src="bar-60x60.jpg" />
Upvotes: 0
Reputation: 336408
I don't know JSoup or Android's regex implementation, but a regex that finds a string starting with img=
and ending with 60x60.jpg
or 60x60.png
would be
\bimg=.*?60x60\.(jpg|png)\b
Perhaps you could post an excerpt of the text you're trying to parse. Possibly regex isn't the solution to your problem.
Upvotes: 1