Trimax
Trimax

Reputation: 2473

Variable length Lookbehinds RegEx don't works in Python

I want extract the text between ':' and '|' characters, but in second and third data there are a space after the ':'

The intput:

Referencia:22726| Referencia Cliente Ak: 233726 | Referencia histórica: 256726 | Suelo | AGOLADA (Pontevedra) -  CARPAZO O PE#A LONJA [EXTRACT]
Referencia:39766| Referencia Cliente Ak: 39767 | Referencia histórica: 39768 | Garaje | MOJACAR (Almería) -  URB.VILLA MIRADOR DEL MAR - MOD. # [EXTRACT]
Referencia:397A5| Referencia Cliente Ak: 397B5 | Referencia histórica: 397C5 | Garaje | MOJACAR (Almería) -  VILLA MIRADOR DEL MAR-MODULO #-PLAZA 4 [EXTRACT]
Referencia:AA39803| Referencia Cliente Ak: P_39803 | Referencia histórica: 200_39803 | Garaje | MOJACAR (Almería) -  VILLA MIRADOR DEL MAR - MODULO [EXTRACT]

Output desired:

22776
233726
256726
39766
39767
39768
397A5
397B5
397C5
AA39803
P_39803
200_39803

My first pattern: (?<=:)(\w{5,12}) This matches only the first column.

My second pattern: (?<=:\s)(\w{5,12}) This matches the second and third columns

So I believed that my third pattern was the correct one: (?<=:\s?)(\w{5,12}) That pattern don't works.

Upvotes: 0

Views: 183

Answers (2)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

a lookbehind can't be variable length in python. A way to solve this:

(?:(?<=:\s)|(?<=:))(\w{5,12})

But since you use a capturing group, a lookbehind is useless:

:\s?(\w{5,12})

Upvotes: 2

Nambi
Nambi

Reputation: 12042

Remove the ? character in the Lookbehind move the \s to the matches

(?<=:)(\s?\w{5,12})

Upvotes: 1

Related Questions