Reputation: 2473
I want extract the text between ':' and '|' characters, but in second and third data there are a space after the ':'
The intput:
Referencia:22726| Referencia Cliente Ak: 233726 | Referencia histórica: 256726 | Suelo | AGOLADA (Pontevedra) - CARPAZO O PE#A LONJA [EXTRACT]
Referencia:39766| Referencia Cliente Ak: 39767 | Referencia histórica: 39768 | Garaje | MOJACAR (Almería) - URB.VILLA MIRADOR DEL MAR - MOD. # [EXTRACT]
Referencia:397A5| Referencia Cliente Ak: 397B5 | Referencia histórica: 397C5 | Garaje | MOJACAR (Almería) - VILLA MIRADOR DEL MAR-MODULO #-PLAZA 4 [EXTRACT]
Referencia:AA39803| Referencia Cliente Ak: P_39803 | Referencia histórica: 200_39803 | Garaje | MOJACAR (Almería) - VILLA MIRADOR DEL MAR - MODULO [EXTRACT]
Output desired:
22776
233726
256726
39766
39767
39768
397A5
397B5
397C5
AA39803
P_39803
200_39803
My first pattern: (?<=:)(\w{5,12})
This matches only the first column.
My second pattern: (?<=:\s)(\w{5,12})
This matches the second and third columns
So I believed that my third pattern was the correct one: (?<=:\s?)(\w{5,12})
That pattern don't works.
Upvotes: 0
Views: 183
Reputation: 89557
a lookbehind can't be variable length in python. A way to solve this:
(?:(?<=:\s)|(?<=:))(\w{5,12})
But since you use a capturing group, a lookbehind is useless:
:\s?(\w{5,12})
Upvotes: 2
Reputation: 12042
Remove the ?
character in the Lookbehind move the \s
to the matches
(?<=:)(\s?\w{5,12})
Upvotes: 1