Reputation: 2225
I am looking for a regular expression in php to parse a string of the following pattern. The command are wrapped by double square bracket as
[[a src="" desc=""]]
where a, src and desc are the keywords (won't be changed). src must be given but desc is optional, the value of src or desc can be wrapped by double or single quote. And src and desc could be given in any order. For example, the following patterns are all valid
[[a src="http://a.c.d" desc ="hello"]]
[[a src ="http://a.c.d" desc= 'hello']]
[[a desc ="hello " src= 'http://a.c.d' ]]
[[a src = "http://a.c.d" ]]
[[a src="http://a.c.d" desc ="hello"]]
any space between value and 'a', 'src', 'desc', '=' (without quotation) should be ignored. I am going to replace this command with html tag like
SOMETHING_EXTRACT_FROM_DESC
It seems pretty tough to think of one regex to do the work. Now I have 3 regex setup to handle difference cases separately. It looks like this
$pattern = '/\[\[a[:blank:]+src[:blank:]*=[:blank:]*"(.*?)"[:blank:]+desc[:blank:]*=[:blank:]+"(.*?)"\]\]/i';
$rtn = preg_replace($pattern, '<a href="${1}">${2}</a>', $src);
$pattern = '/\[\[a[:blank:]+desc[:blank:]*=[:blank:]*"(.*?)"[:blank:]+src[:blank:]*=[:blank:]+"(.*?)"\]\]/i';
$rtn = preg_replace($pattern, '<a href="${1}">${2}</a>', $rtn);
$pattern = '/\[\[a[:blank:]+src[:blank:]*=[:blank:]+"(.*?)"\]\]/i';
$rtn = preg_replace($pattern, '<a href="${1}">${2}</a>', $rtn);
But this doesn't work, regular expression is hard to learn :(
Upvotes: 2
Views: 114
Reputation: 4916
I wrote a regular expression that matches everything you requested, but allows a bit of an overhead I''ll explain at the end. But first the regex:
Looks like this:
\[\[a(\s+(src|desc)\s*=\s*('[^']*'|"[^"]*")){1,2}\s*\]\]
I'll brake it down so you can understand it:
\[\[ ... \]\]
matches [[ ... ]]
, the beginning and ending\s
matches any whitespace (space and tab), \s+
expects at least one(src|desc)
matches either the string src
or the string desc
. It's an OR operator: match src
OR desc
.'[^']*
' matches two single quotes and anything in between that is not a single quote"[^"]*"
same with double quotes('[^']*'|"[^"]*")
matches one of the above two(src|desc)\s*=\s*('[^']*'|"[^"]*")
matches a token like src='something'
{1,2}
matches something once or twice, appending to the above expression, metches one or two of those tokensAnd that's pretty much it. The only problem is that it will also match this:
[[a src="http://a.c.d" src="http://a.c.d"]]
Which I think is a mismatch. If it doesn't bother you, you're good to go, otherwise you'll need to change the whole concept of using a big atom with ors (i.e.: |
) and take a different approach. You could use look-aheads for example. But it will get real nasty pretty fast.
You can test it online HERE
The regex is much more readable if I remove the backslashes and the \s stuffs. This won't work, but I think it will help you understand it:
[[a ( (src|desc)=('[^']*'|"[^"]*") ){1,2} ]]
Upvotes: 1