Reputation: 20865
i want to download a site an replace all links on that site to an internal link.
that's easy:
$page=file_get_contents($url);
$local=$_SERVER['HTTP_HOST'].$_SERVER['PHP_SELF'];
$page=preg_replace('/href="(.+?)"/','href="http://'.$local.'?href=\\1"',$page);
but i want to exclude all css files and js files from replacing, so i tried:
$regex='/href="(.+?(?!(\.js|\.css)))"/';
$page=preg_replace($regex,'href="http://'.$local.'?href=\\1"',$page);
but that didnt work,
what am i doing wrong?
i thought
?!
is a negative lookahead
Upvotes: 0
Views: 1094
Reputation: 145482
To answer your regex question, you need a lookbehind there and better limit the match with a character class:
$regex = '/href="([^"]+(?<!\.js|\.css))"/';
The charclass first matches the whole link content, then asserts that this didn't end in .js
or .css
.
You might want to augment the whole match with <a\s[^>]*?
even, so it really just finds anything that looks like a link.
Another option would be using domdocument or querypath for such tasks, which is usually tedious and more code, but simpler to add programmatic conditions to:
htmlqp->find("a") FOREACH $a->attr("href", "http:/...".$a->attr("href"))
// would need a real foreach and an if and stuff..
Upvotes: 4