rubo77
rubo77

Reputation: 20865

regular expression for replacing all links but css and js

i want to download a site an replace all links on that site to an internal link.

that's easy:

$page=file_get_contents($url);
$local=$_SERVER['HTTP_HOST'].$_SERVER['PHP_SELF'];
$page=preg_replace('/href="(.+?)"/','href="http://'.$local.'?href=\\1"',$page);

but i want to exclude all css files and js files from replacing, so i tried:

$regex='/href="(.+?(?!(\.js|\.css)))"/';
$page=preg_replace($regex,'href="http://'.$local.'?href=\\1"',$page);

but that didnt work,

what am i doing wrong?

i thought

?!

is a negative lookahead

Upvotes: 0

Views: 1094

Answers (1)

mario
mario

Reputation: 145482

To answer your question, you need a lookbehind there and better limit the match with a character class:

$regex = '/href="([^"]+(?<!\.js|\.css))"/';

The charclass first matches the whole link content, then asserts that this didn't end in .js or .css. You might want to augment the whole match with <a\s[^>]*? even, so it really just finds anything that looks like a link.

Another option would be using or for such tasks, which is usually tedious and more code, but simpler to add programmatic conditions to:

htmlqp->find("a") FOREACH $a->attr("href", "http:/...".$a->attr("href"))
// would need a real foreach and an if and stuff..

Upvotes: 4

Related Questions