Reputation: 2976
I am trying to remove whitespace and dot from hyperlinks all rules are working fine except its not removing dot from url. Here are few examples
<a href=" http://www.example.com ">example site</a>
<a href=" http://www.example.com">example 2</a>
<a href="http://www.example.com.">final example</a>
$text = preg_replace('/<a href="([\s]+)?([^ "\']*)([\s]+)?(\.)?">([^<]*)<\/a>/', '<a href="\\2">\\5</a>', $text);
In the last example RE should remove dot from url. Dot is optional so I wrote this rule (.)?
Upvotes: 0
Views: 117
Reputation:
This will trim up the hrefs (I asume you mean to trim them).
for both '"
value delimeters (expanded):
(<a \s+ href \s* = \s*)
(?|
(") \s* ([^"]*?) [\.\s]* (")
| (') \s* ([^']*?) [\.\s]* (')
)
([^>]*>)
replacement is: $1$2$3$4$5
or,
for just "
value delimeter (expanded):
(<a \s+ href \s* = \s* ")
\s*
([^"]*?)
[\.\s]*
(" [^>]*>)
replacement is: $1$2$3
Upvotes: 1
Reputation: 4289
Because your dot is already matched by ([^ "\']*)
group.
Change it to ([^ "\']*?)
- ungreedy version.
And also I suggest you to replace ([\s]+)?(\.)?
with [\s.]*
to handle "www.example.com. " strings.
Upvotes: 1
Reputation: 28721
The following is un-tested.
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->Load('source.html');
$xpath = new DOMXPath($doc);
// We starts from the root element
$query = 'a';
$anchors = $xpath->query('a');
foreach($anchors as $aElement) {
$aElement->setAttribute('href', trim($aElement->getAttribute('href'), ' .'));
}
$doc->saveHTMLFile('new-source.html');
Upvotes: 0
Reputation: 11028
How about <a href="([\s]+)?([^ "\']*\.[a-zA-Z]{2,5})([\s]+)?(\.)?">([^<]*)<\/a>
? .[a-zA-Z]{2,5}
?
It will catch .com, .info, .edu and even something like .com.au
Upvotes: 1