Frank Ly
Frank Ly

Reputation: 649

regular expression meaning

can anyone explain the meaning behind this regular expression?

'/<div align="left"><a href="(.*?)">(.*?)<\/a><\/div>/s'

from what i know it is search for all div align tags that equal left but for the next part I am lost.

Upvotes: 0

Views: 250

Answers (5)

user1218947
user1218947

Reputation: 47

'/<div align="left"><a href="(.*?)">(.*?)<\/a><\/div>/s'

The regex above is very specific as opposed a generic <a> tag regex capture. Such specificity (hard coded text) will make the regex "brittle" (easily broken).

  1. the forward slashes at the beginning and end are delimiters to indicate the string is a regex string. The "s" after the last forward slash is a regex modifier and means that each period in the regex will match all characters including new line characters.

  2. The backslashes before the forward slashes in the closing </a> tag and closing </div> tag are escape characters. The escape characters are needed because of the first forward slash and final forward slash that indicate the string is regex. Hence all forward slashes within the expression must be escaped.

  3. This regex will only work with a div with the exact text shown above. Any additional attributes added to the div will break this regex. Even one extra space within the div will break this regex.

  4. Next the div must be followed by an <a> tag exactly as shown in the regex. If any additional attributes are added to the <a> tag this regex would break. Example if the href value is delimited with single quotes instead of double quotes the regex will break-fail to match.

  5. The href can contain any character. The regex will match all characters in the href value until the closing quote is found. The <a> tag must have only an href attribute and nothing else. The <a> tag must be followed immediately by a closing </div> tag exactly like the regex.

  6. The primary purpose of the regex is to "capture" the href value and the <a> tag text for some extremely specific html. Typically the capture-match will be output into an array.

Upvotes: 1

callumacrae
callumacrae

Reputation: 8433

It searches for anchor tags within divs with align left. It also saves the href and anchor text so that they can be referred to later.

Would match: <div align="left"><a href="#">test</a></div>

There are a couple things wrong with the regex through: first, the use of the dot operator (".") should be avoided. It is designed to match everything but new lines, meaning that the following would match:

<div align="left"><a href="#">test</a><a href="#">test</a></div>

That would save the text as "test" and the href as "#">test</a><a href="#".

Upvotes: 0

Bohemian
Bohemian

Reputation: 425448

It (tries to) finds all anchor tags within left aligned divs, and

  • Group 1 of the match is the url
  • Group 2 of the match is the link text

FYI, regex and HTML don't play nice together, so "don't try this at home".

Upvotes: 0

frumbert
frumbert

Reputation: 2427

. means any single character

* means zero or more of the previous item (which is a greedy operator)

? after a star is a strange one in this position. Normally ? after a . means "one or more of the previous item" or if used like this (?:.*) means "match anything but don't create a backreference".

So href="(.*)" should also match exactly what href="(.*?)" will match.

At any rate your match pattern should be:

$0 will equal the whole div

$1 will equal the value inside the href

$2 will equal the value inside the tag

You can try out regular expressions online at http://www.regextester.com/ - there's also various apps and widgets for your OS of choice for testing them.

Upvotes: 0

latestVersion
latestVersion

Reputation: 458

Irrespective of whatever this is for and whether it'll work or not (regex is not an option to match html tags), for the sake of explaining, the second part of the regex <a href="(.*?)">(.*?)<\/a><\/div>/s is just "trying" to match all the anchor tags with any url followed by the text which will contain that url.

When I saying "trying", this is what the person who wrote the regex inteded to do.

Upvotes: 0

Related Questions