Python regex: Getting all numbers besides some which are followed by specific terms

Question

The goal is to get all the numbers from a text besides those which are either followed by or are trailing specific words/characters (including ignoring date). What I am struggling with is negative lookbehind

For example: 4.5 $55 1,200 wordA 3 sometext 2 wordB sometext 4.3charA sometext charB21.6 sometext 11/10/22

In the sample numbers 3, 2, 4.3, 21.6 and the date 11/10/22 would be ignored

My attempt https://regex101.com/r/PQvtOl/1/

(\d*\b[\.,]?\d+)(?!\d*? (?:wordB))(?!\d*?(?:charA))((?!\b[charB/])(?!\d+))

Any help would be greatly appreciated!

Wiktor Stribiżew · Accepted Answer

You can use

(?


Get only those matches that are captured into capturing group #1. See the regex demo. Details:

(? - a date-like string: no digit allowed immediately on the left, then one or two digits, /, one or two digits, /, and then two or four digits with no extra digit on the right allowed, or

\b(?:charB|wordA)\s*\d*[.,]?\d+ - a word boundary, then charB or wordA, zero or more whitespaces, zero or more digits, an optional dot or comma, one or more digits
| - or (the next part is captured, and re.findall will only output those in the resulting list, the above ones will be discarded)
(? - no digit or digit and a . or , allowed immediately on the left, then zero or more digits, an optional . or , and one or more digits are captured into Group 1, and then the negative lookahead fails the match if there is wordB, charA or an optional . or , and a digit appear immediately on the right after any zero or more whitespaces.


See the Python demo:
import re
text = '4.5 $55 1,200 wordA 3 sometext 2 wordB sometext 4.3charA sometext charB21.6 sometext 11/10/22'
rx = r'(? ['4.5', '55', '1,200']

Python regex: Getting all numbers besides some which are followed by specific terms

Answers (1)

Related Questions