Hypn0tizeR
Hypn0tizeR

Reputation: 794

Get all URLs from plain CSS

Let's say we have some CSS in our $plain_css variable:

.slide-pause {
  cursor: url(http://example.com/img/bg/pause.png),url(http://example.com/img/bg/pause.png),auto;
}
.something {
  background-image: url('http://example.com/img/bg/beautiful.png'); // We have Quotes here
}

I need to get all URLs from this CSS.

This is how I'm trying to achieve this:

preg_match_all('!url\(\'?http://example.com/.*\)!', $plain_css, $matches);

What $matches returns:

array
  0 => 
  array
    0 => string 'url(http://example.com/img/bg/pause.png),url(http://localhost/site/img/bg/pause.png)'
    1 => string 'url(http://example.com/img/bg/beautiful.png)'

What I need it to return:

array
  0 => string 'url(http://example.com/img/bg/pause.png)'
  1 => string 'url(http://example.com/img/bg/pause.png)'
  2 => string 'url(http://example.com/img/bg/beautiful.png)'

Upvotes: 0

Views: 64

Answers (2)

Martin Ender
Martin Ender

Reputation: 44259

You're a victim of greediness. .* matches as much as it can. Replace it with .*? to make it ungreedy for a quick fix. Or disallow ) from the repeated characters (which is usually preferred - it's more explicit and more efficient):

preg_match_all('!url\(\'?http://example.com/[^)]*)!', $plain_css, $matches);

Note that you can't convince preg_match_all to return everything in a plain array - you will always get a nested array (which is important for capturing). But you can simply get your desired result from $matches[0].

Upvotes: 3

Ethan Brown
Ethan Brown

Reputation: 27282

You need to make your repetition quantifier lazy (the default is greedy):

preg_match_all('!url\(\'?http://example.com/.*?\)!', $plain_css, $matches);

The only change here is that I added a question mark after the * repetition quantifier. Normally, repetitions are greedy: that is, they match as many characters as they possibly can (and still satisfy the expression). In this case, the greediness of the * quantifier was consuming both url expressions in your input string. Changing to a lazy quantifier fixes the problem.

The other way to handle this is to use a negated character class instead of the . metacharacter (which matches any character except a newline):

preg_match_all('!url\(\'?http://example.com/[^)]*\)!', $plain_css, $matches);

Upvotes: 2

Related Questions