Reputation: 17383
My web service returns a html string like below :
{"content":"[caption id=\"attachment_7691\" align=\"aligncenter\" width=\"300\"]<img class=\"wp-image-7691 size-medium\" src=\"http:\/\/smsbaz.org\/wp-content\/uploads\/2015\/07\/funny-sms-exams-300x217.jpg\" alt=\"funny sms exams\" width=\"300\" height=\"217\" \/> funny sms exams[\/caption]\r\n<p style=\"text-align: center\">\u062f\u0631\u0633 \u062e\u0648\u0627\u0646\u062f\u0646 \u0686\u06cc\u0633\u062a\u061f\r\n.\r\n.\r\n.\r\n\u0628\u0647\u062a\u0631\u06cc\u0646 \u0642\u0631\u0635 \u062e\u0648...
I would like to extract all images like :
(source: smsbaz.org)
I'm using this function but the size of array always is 0 :
public ArrayList<String> getImagesOfFromHtmlString(String str){
ArrayList<String> arr_images = new ArrayList<>();
Pattern pattern = Pattern.compile("(https?://\\s*\\S+\\.(?:jpg|JPEG|png|gif))");
Matcher m = pattern.matcher(str);
while(m.find()){
arr_images.add(m.group());
}
return arr_images ;
}
where is my wrong ?
Upvotes: 1
Views: 199
Reputation: 8743
This is a little bit dangerous, you could also have relative URLs. Anyway there seems to be a problem with your character classes, e.g. \s
stands for whitespaces. Also I noted that you use group()
in this case you don't need to capture, it will be the same as group(1)
in your code. Here a solution, not perfect, but good enough to extract:
"src=[\"'](https?://[^\"']+?\\.(?:jpg|JPEG|png|gif))['\"]"
Upvotes: 1