Reputation: 913
Dealing with this crazy string that is a conversion from a PDF to text framework.
I'll post it at the end but it is probably easier to decipher here: https://regex101.com/r/DxXupz/1
I figured out how to match the contents between 1.
and 2.
using this regex:
1\.(.*?)2\.
But as you can see the $string
I'm dealing with has all sorts of numerics and decimals and the like, and goes all the way up to 11.
Is there a regex solution to capture all the numbered lists in one preg_match_all
function query, ie (example with regex above for 1.
to 2.
):
preg_match_all('/1\.(.*?)2\./s', $string, $matches);
To bring back the contents from 1.
to 2.
, 2.
to 3.
, and so forth?
$string = "1. CZ243 96V DC
20
0pcs
11.35U
SD 220
.
00
USD
2
”
,74mm/s
25lbs .
2.
CV243 96V DC
10
0pcs
11.35USD 1135
.00
USD
4
”
,74mm/s
25lbs
3
. CV243 96V DC
150pcs 12.20
U
SD 1830.00
USD
6
”
,74mm/s
25lbs .
4. CV243 96V DC
100
pcs 13.50
1USD 1350.00
USD
8
”
,74mm/s
25lbs .
5
. CV243 96V DC
50
pcs
15.00USD
750.00
USD
10
”
,74mm/s
25lbs .
6. CV243 96V DC
200pcs
15.00USD
3000.00
USD
12
”
,74mm/s
25lbs .
7
. CV243 96V DC
50pcs
16.00USD 800.00
USD
14
”
,74mm/s
25lbs .
8. CV243 96V DC
75pcs 16.50
USD
1237.50
USD
16
”
,74mm/s
25lbs .
9. CV243 96V DC
5
0pcs
18.46USD
923.00
USD
18
”
,74mm/s
25lbs .
10.CV243 96V DC
50pcs
18.46USD
923.00
USD
20
”
,74mm/s
25lbs .
11.
CV243 96V DC
5
0pcs
20.77USD 1038.50
USD
24
”
,74mm/s
25lbs .
";
Upvotes: 0
Views: 61
Reputation: 147206
This regex should give you the results you want:
\d+\s*\.\s*(CV243 96V DC.*?)(?=\d+\s*\.\s*CV243 96V DC|$)
It looks for some digits, followed optionally by whitespace, a period, some possible whitespace and the string CV243 96V DC
. It then grabs all the characters up to the next occurrence of the starting pattern or the end of the string (asserted using a positive lookahead so the characters are not captured in that match). In PHP:
preg_match_all('/\d+\s*\.\s*(CV243 96V DC.*?)(?=\d+\s*\.\s*CV243 96V DC|$)/s', $string, $matches);
print_r($matches[1]);
The output is somewhat messy so I won't repeat it all here but you can see this in operation in this demo. Here are the first two values:
[0] => CV243 96V DC 20 0pcs 11.35U SD 220 . 00 USD 2 ” ,74mm/s 25lbs .
[1] => CV243 96V DC 10 0pcs 11.35USD 1135 .00 USD 4 ” ,74mm/s 25lbs
Note
I've assumed your data is supposed to start with 1. CV243
, not 1. CZ243
. If it supposed to start with 1. CZ243
and you still want to capture that, change the CV243
in the regex to C[VZ]243
.
Upvotes: 1