Seerumi
Seerumi

Reputation: 2037

Extracting specific <a href> URLs out of the document

I think this should be elementary, but I still can't get my head around it. Let's say there's fair amount of HTML documents and I need to catch every image URLs out of them.

The rest of the content changes, but the base of the url is always the same for example: http://images.examplesite.com/images/,

So I want to extract every string that contains that part. the problem is that they're always mixed with <a href=''> or <img src=''> tags, so how could I drop them out? preg_match probably?

Upvotes: 0

Views: 492

Answers (2)

ahmetunal
ahmetunal

Reputation: 3960

You can either use html dom parser

or use regular expression.

  preg_match_all("/http:\/\/images.examplesite.com\/images\/(.*?)\"/s", $str, $preg);
  print_r($preg);

Upvotes: 0

Narcis Radu
Narcis Radu

Reputation: 2547

Try something like: preg_match_all('/http:\/\/images\.examplesite\.com\/images\/(.*?)"/i', $html_data, $results, PREG_SET_ORDER)

Upvotes: 1

Related Questions