Reputation: 10717
I'm trying to match markdown tags with recursive.
Input Syntax
(TYPE: VALUE ATTR_KEY: ATTR_VALUE)
Note that syntax should be starts with: [a-z0-9_-]+:
Sample Inputs:
(image: sky.jpg)
(image: sky.jpg caption: Sky (Issue This) View)
(link: https://stackoverflow.com text: Stack Overflow)
(link: https://stackoverflow.com text: Stack Overflow rel=nofollow)
(video: http://www.youtube.com/watch?v=49Kh1mS4Fhs)
Currently using following regex:
(?=[^\]])\([a-z0-9_-]+:.*?\)
But issue coming from here, because match with:
(image: sky.jpg caption: Sky (Issue This)
Expected match:
(image: sky.jpg caption: Sky (Issue This) View)
If parentheses are used again in parentheses, it does not match exactly.
I tried following recursive patterns and works but i need to restrict starts with characters.
(?s)\((?:[^()]+|(?R))*+\)
\((?:[^)(]+|(?R))*+\)
Upvotes: 1
Views: 70
Reputation: 626738
You should use a positive lookahead to match sure the match starts with that pattern, but you will have to wrap the whole parentheses matching pattern within another capturing group and use a (?1)
subroutine instead of (?R)
to only recurse that pattern, not the whole regex:
(?=\([a-z0-9_-]+:)(\((?:[^()]+|(?1))*+\))
^^^^^^^^^^^^^^^^^^^ ^^^^ ^
See the regex demo.
Details
(?=\([a-z0-9_-]+:)
- a positive lookahead that requires (
, 1+ lowercase ASCII letters, digits, underscores or hyphens followed with :
immediately to the right of the current location (\((?:[^()]+|(?1))*+\))
- Capturing group 1 (it will be recursed later):
\(
- (
(?:[^()]+|(?1))*+
- 1+ repetitions of 1+ any chars other than (
and )
or the whole Group 1 pattern (recursed)\)
- )
In case you want to also support smileys, you may add their specific patterns in the alternation group where the regex subroutine resides, as the first alternative:
(?=\([a-z0-9_-]+:)(\((?::[)(]|[^()]|(?1))*+\))
^^^^^
I add :[)(]
that matches :)
or :(
and removed +
from after [^()]
so as to be able to check the string inside nested parentheses character by character.
Feel free to adjust it to your needs, or add more smiley patterns.
See this regex demo with the (?=\([a-z0-9_-]+:)(\((?::(?:[()pPDd*oO]|'\()|<3|;\)|[^()]|(?1))*+\))
regex.
Upvotes: 2