Reputation: 194
I'm not sure if recursion is the correct way to characterize what's occurring in this pattern, but unfortunately I'm too new with regex to build something that will conform to how this pattern can vary and avoid nested groups.
So the pattern is basically defined as:
@param {item} {label}:{text} {labeln}:{textn}
where labeln
and textn
is some N instance of the label:text group.
So an example would be
/**
*
* @param name1 test1:this is text for test1 test2:this is text for test2
* @param name2 test3:this is text for test3 test4:this is text for test4 test5:this is text for test5
*
* /
Now ideally I'm trying to capture name1
, test1:this is text for test1
, and test2:this is text for test2
as matching groups. Same goes for the name2
line. Of course there can be many more examples of name1
and the psuedo "named parameters" can be varied, from none to many. +Edit: Colons would not be permitted within the label text since they're reserved as delimiters. Label is strictly alphanumeric, label would probably be restricted to a-zA-Z0-9_,'"-
First question is... is this a recursion problem or did I mischaracterize this?
Second question is... is it possible and if so, how can I achieve this?
Upvotes: 1
Views: 104
Reputation: 20486
Preface:
For the sake of explanation, I decided to clarify your "labels" by preceding them with a %
. This can be any reserved symbol or other pattern that helps clear up what is a label/text:
/**
* @param variable_a %label:This is variable: a %required:true
* @param variable_b %required:false %pattern:/[a-zA-Z:]/
*/
Problem:
The problem with capturing repetitive patterns in regular expressions is you can't have an unknown amount of capture groups (i.e. you either need to match a global number of matches or capture a specific amount of groups in each match):
@param (?# find a param)
\s* (?# whitespace)
(\w+) (?# capture the variable)
\s* (?# whitespace)
(?: (?# start non capturing group)
%(\w+): (?# capture the label)
([^%\n]+) (?# capture the text)
)+ (?# repeat the non-capturing group)
In this example, I put the label/text capturing code in a non-capturing and repeated (1+ times) group. This allows us to match the whole string, however only the last set of labels/texts are captured (since we only have 3 groups: variable, label, and text).
Straightforward Solution:
Instead of this, you can just match the whole string and then parse the label/text string after-the-fact:
(?# match the whole string)
@param (?# find a param)
\s* (?# whitespace)
(\w+) (?# capture the variable)
\s* (?# whitespace)
(.*) (?# capture the labels/texts)
(?# parse the label/text string)
% (?# the start of a label)
(\w+) (?# capture label)
: (?# end of label)
([^%]+) (?# capture text)
Awesome Solution:
Finally, we can use some regular expression magic to do a global match of all label/text combinations. This means we will have a defined set of 3 capture groups (variable, label, text) and we'll have a variable amount of matches. I think this one is best to show and then explain, so here is the crazy awesome regex magic:
(?: (?# start non-capturing group)
@param (?# find a param)
\s* (?# whitespace)
(\w+) (?# capture the variable)
\s* (?# whitespace)
| (?# OR)
\G (?# start back over from our last match)
) (?# end non-capturing group)
%(\w+): (?# capture the label)
([^%\n]+) (?# capture the text)
This one revolves around the PCRE magic of \G
, which matches the end of the last match. So we start a non-capturing group that will contain the "prefix" of a @param
definition. This will either match and capture the variable OR start over from the end of our last match. Then we match/capture 1 label/text group. Next time it is repeated, we will start where we left off, the variable capture group will be blank (since it doesn't exist that far into the string, you'll have to use logic to know which variable you are on), and capture another label/text group (until we hit a new line, since I said a text can't be %
or \n
). Then the next match attempt will find a new variable defined by @param
. I think this will be your best option, it just takes some more logic on your end.
Upvotes: 4
Reputation: 3446
Well, if you allow your middle label to contain a :
but you don't allow it in your end label, I believe the below RegEx should work well enough:
@param\s+(.+?)\s+(.+:.+)\s+([^:]+:[^:]+)$
However, it won't work if your pattern spans multiple lines.
Also, if you're trying to parse PHPDoc or some variant thereof, you should write your own parser rather using RegEx.
Upvotes: 0