Lucas Bustamante
Lucas Bustamante

Reputation: 17208

preg_split without delete search pattern

I have a some thousands text file to parse, it's a product catalog that follows a certain pattern.

It has two serial numbers, with one of them I was splitting the whole text into an array, each key a product.

The problem is the serial I was using in preg_split gets deleted from the product, and I need it.

Here's a raw product:

1532.000028-01532.213.00010875-8
TRES ANÉIS, DOIS PENDENTES, DOIS BRINCOS, SENDO UM 
COM 
TARRACHA DE METAL NÃO NOBRE, DE: OURO, OURO BRANCO BAIXO; 
CONTÉM: diamantes, pérola cultivada, pedra, massa; CONSTAM: amassada(s), 
incompleta(s), PESO LOTE: 13,50G (TREZE GRAMAS E CI NQUENTAR$ 901,00
Valor Grama: 66,74

The first numbers are the two serials, they are stick together beacuse of flaws of the PDF parser.

Here's the REGEX I'm using to split the array into products:

$texto = preg_split("/([0-9]{4}[.][0-9]{6}[-][0-9]{1})+/",$texto);

Output:

1532.213.00010875-8
TRES ANÉIS, DOIS PENDENTES, DOIS BRINCOS, SENDO UM 
COM 
TARRACHA DE METAL NÃO NOBRE, DE: OURO, OURO BRANCO BAIXO; 
CONTÉM: diamantes, pérola cultivada, pedra, massa; CONSTAM: amassada(s), 
incompleta(s), PESO LOTE: 13,50G (TREZE GRAMAS E CI NQUENTAR$ 901,00
Valor Grama: 66,74

As you can see, the first serial is removed from the output. I need it. How can I split these products, keeping both arrays?

Upvotes: 3

Views: 528

Answers (1)

user3942918
user3942918

Reputation: 26385

Change your capture group into a lookahead, like this:

$texto = preg_split("/(?=[0-9]{4}[.][0-9]{6}[-][0-9]{1})/",$texto);

Upvotes: 8

Related Questions