Keith
Keith

Reputation: 168

How to extract an array from a string that contains a numbered list?

Using PHP, I am looking to extract an array from a string that contains a numbered list.

Example string:

The main points are: 1. This is point one. 2. This is point two. 3. This is point three.

would result in the following array:

[0] => 1. This is point one.
[1] => 2. This is point two.
[2] => 3. This is point three.

The format of the string can vary - e.g.:

1. This is point one, 2. This is point two, 3. This is point three.
1) This is point one  2) This is point two 3) This is point three
1 This is point one. 2 This is point two. 3 This is point three.

I have started using preg_match_all with the following pattern:

!((\d+)(\s+)?(\.?)(\)?)(-?)(\s+?)(\w+))!

but I am unsure as how to match rest of string/up to the next match.

Example available at RegExr

Upvotes: 3

Views: 262

Answers (2)

newfurniturey
newfurniturey

Reputation: 38456

If your input follows your example input, as in each "point" doesn't contain a number itself, you could use the following regex:

\d+[^\d]*

In PHP, you could use preg_match_all() to capture everything:

$text = 'The main points are: 1. This is point one. 2. This is point two. 3. This is point three.';

$matches = array();
preg_match_all('/(\d+[^\d]*)/', $text, $matches);

print_r($matches[1]);

This will result in:

Array
(
    [0] => 1. This is point one.
    [1] => 2. This is point two.
    [2] => 3. This is point three.
)

Again though, if there are any numbers/digits in the actual points themselves - this won't work.

If you want actual numbers to appear in each point, you'll need to define an actual "anchor" or "end" of each point, such as a period. If you can state that a . will appear only at the end of the point (ignoring the potential one that follows the leading-digit), you could use the following regex:

\d+[.)\s][^.]*\.

It can be dropped into the preg_match_all() from above just as easily:

preg_match_all('/(\d+[.)\s][^.]*\.)/', $text, $matches);

Regex explained:

\d+        # leading number
[.)\s]     # followed by a `.`, `)`, or whitespace
[^.]*      # any non-`.` character(s)
\.         # ending `.`

The caveat with the second regex is that a . may only appear at the end of each point (and following the leading digit). However, I think that this rule may be easier to follow than the "no numbers in the point" rule - it all depends on your actual input though.

Upvotes: 4

DhruvPathak
DhruvPathak

Reputation: 43245

Use preg_split ,it would be easier, just split the string based on your numbering format, and return non-empty results. modify this to suit your needs :

http://codepad.org/tK6fGCRB

<?php

$theReg = '/\d\.|\d\)|\d /';
$theStrs = array(
                '1. This is point one, 2. This is point two, 3. This is point3' ,
                '1) This is point one  2) This is point two 3) This is point 3' ,
                '1 This is point one. 3 This is point three. 4 This is point 4'
                );

foreach($theStrs as $str)
   print_r(preg_split($theReg, $str , -1 , PREG_SPLIT_NO_EMPTY));;
?>

Upvotes: 0

Related Questions