Reputation: 168
Using PHP, I am looking to extract an array from a string that contains a numbered list.
Example string:
The main points are: 1. This is point one. 2. This is point two. 3. This is point three.
would result in the following array:
[0] => 1. This is point one.
[1] => 2. This is point two.
[2] => 3. This is point three.
The format of the string can vary - e.g.:
1. This is point one, 2. This is point two, 3. This is point three.
1) This is point one 2) This is point two 3) This is point three
1 This is point one. 2 This is point two. 3 This is point three.
I have started using preg_match_all with the following pattern:
!((\d+)(\s+)?(\.?)(\)?)(-?)(\s+?)(\w+))!
but I am unsure as how to match rest of string/up to the next match.
Example available at RegExr
Upvotes: 3
Views: 262
Reputation: 38456
If your input follows your example input, as in each "point" doesn't contain a number itself, you could use the following regex:
\d+[^\d]*
In PHP, you could use preg_match_all()
to capture everything:
$text = 'The main points are: 1. This is point one. 2. This is point two. 3. This is point three.';
$matches = array();
preg_match_all('/(\d+[^\d]*)/', $text, $matches);
print_r($matches[1]);
This will result in:
Array
(
[0] => 1. This is point one.
[1] => 2. This is point two.
[2] => 3. This is point three.
)
Again though, if there are any numbers/digits in the actual points themselves - this won't work.
If you want actual numbers to appear in each point, you'll need to define an actual "anchor" or "end" of each point, such as a period. If you can state that a .
will appear only at the end of the point (ignoring the potential one that follows the leading-digit), you could use the following regex:
\d+[.)\s][^.]*\.
It can be dropped into the preg_match_all()
from above just as easily:
preg_match_all('/(\d+[.)\s][^.]*\.)/', $text, $matches);
Regex explained:
\d+ # leading number
[.)\s] # followed by a `.`, `)`, or whitespace
[^.]* # any non-`.` character(s)
\. # ending `.`
The caveat with the second regex is that a .
may only appear at the end of each point (and following the leading digit). However, I think that this rule may be easier to follow than the "no numbers in the point" rule - it all depends on your actual input though.
Upvotes: 4
Reputation: 43245
Use preg_split ,it would be easier, just split the string based on your numbering format, and return non-empty results. modify this to suit your needs :
<?php
$theReg = '/\d\.|\d\)|\d /';
$theStrs = array(
'1. This is point one, 2. This is point two, 3. This is point3' ,
'1) This is point one 2) This is point two 3) This is point 3' ,
'1 This is point one. 3 This is point three. 4 This is point 4'
);
foreach($theStrs as $str)
print_r(preg_split($theReg, $str , -1 , PREG_SPLIT_NO_EMPTY));;
?>
Upvotes: 0