Reputation: 2976

Stripping whitespace and dot from hyperlinks

I am trying to remove whitespace and dot from hyperlinks all rules are working fine except its not removing dot from url. Here are few examples

 <a href="   http://www.example.com   ">example site</a>
 <a href="   http://www.example.com">example 2</a>
 <a href="http://www.example.com.">final example</a>


  $text = preg_replace('/<a href="([\s]+)?([^ "\']*)([\s]+)?(\.)?">([^<]*)<\/a>/', '<a href="\\2">\\5</a>', $text);

In the last example RE should remove dot from url. Dot is optional so I wrote this rule (.)?

Upvotes: 0

Answers (4)

user557597

Reputation:

This will trim up the hrefs (I asume you mean to trim them).

for both '" value delimeters (expanded):

(<a \s+ href \s* = \s*)
(?|
     (") \s* ([^"]*?) [\.\s]* (")
  |  (') \s* ([^']*?) [\.\s]* (')
)
([^>]*>)

replacement is: $1$2$3$4$5

or,

for just " value delimeter (expanded):

(<a \s+ href \s* = \s* ")
\s* 
([^"]*?)
[\.\s]*
(" [^>]*>)

replacement is: $1$2$3

Upvotes: 1

pronvit

Reputation: 4289

Because your dot is already matched by ([^ "\']*) group.

Change it to ([^ "\']*?) - ungreedy version.

And also I suggest you to replace ([\s]+)?(\.)? with [\s.]* to handle "www.example.com. " strings.

Upvotes: 1

Ben Rowe

Reputation: 28721

The following is un-tested.

$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->Load('source.html');

$xpath = new DOMXPath($doc);

// We starts from the root element
$query = 'a';

$anchors = $xpath->query('a');

foreach($anchors as $aElement) {
    $aElement->setAttribute('href', trim($aElement->getAttribute('href'), ' .'));
}

$doc->saveHTMLFile('new-source.html');

Upvotes: 0

Seyeong Jeong

Reputation: 11028

How about <a href="([\s]+)?([^ "\']*\.[a-zA-Z]{2,5})([\s]+)?(\.)?">([^<]*)<\/a>? .[a-zA-Z]{2,5}?

It will catch .com, .info, .edu and even something like .com.au

Upvotes: 1

Stripping whitespace and dot from hyperlinks

Answers (4)

Related Questions