Digital Moniker
Digital Moniker

Reputation: 281

Regex for 10,000 through 100,000 to 150,000,000 and beyond

I'm trying to dollar amounts between 10,000 to 150,000,000.

I got this from a stack user previously but only catches from 1,000,000 through 150,000,000

(?<!\d)(\d{1,3}(?:,\d{3}){2,})(?!\d)

I tried reworking it for the last hour but can't and regex is a notorious head wreck :D anyone can update it to start catching from 10,000? Thanks!

Upvotes: 4

Views: 1228

Answers (4)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627190

You can use

(?<!\d)(\d{1,3}(?:,\d{3})+)(?!\d)

See the regex demo.

Details:

  • (?<!\d) - no digit allowed immediately to the left of the current location
  • (\d{1,3}(?:,\d{3})+) - Group 1: one to three digits followed with one or more (due to + quantifier) occurrences of a comma and thee digits
  • (?!\d) - no digit allowed immediately to the right of the current location.

Upvotes: 1

The fourth bird
The fourth bird

Reputation: 163477

You current pattern could also possibly match 999,999,999,999 due to the repeating of 2 or more times for this part (?:,\d{3}){2,}

The pattern also uses only the \d which can match 0-9 and is not limited to 5 anywhere in the pattern.


Matching 3 digits after the comma, you could use use an alternation | to match the separate range parts:

(?<!\S)(?:[1-9]\d\d?,\d{3}|(?:[1-9]\d?|1[0-4]\d),\d{3},\d{3}|150,000,000)(?!\S)
  • (?<!\S) Assert whitespace boundary to the left
  • (?: Non capture group
    • [1-9]\d\d?,\d{3} Match range 10,000 - 999,999
    • | Or
    • (?: Non capture group
      • [1-9]\d? Match range 1 - 99
      • | Or
      • 1[0-4]\d Match digits 100 - 149
    • ) Close non capture group
    • ,\d{3},\d{3} Match range ,000,000 - ,999,999
    • | Or
    • 150,000,000 Match the max value
  • ) Close non capture group
  • (?!\S) Assert whitespace boundary to the right

Regex demo

Upvotes: 0

Jan
Jan

Reputation: 43169

Just cast it and compare it programmatically:

import pandas as pd

dct = {"numbers": ["10", "100", "200", "5,000", "10,000", "15000", "some weird stuff", "160,000,000"]}


def tester(number):
    try:
        number = float(number.replace(",", ""))
        if 10 * 10 ** 3 <= number <= 150 * 10 ** 6:
            return True
    except:
        pass
    return False

df = pd.DataFrame(dct)
df["in_range"] = df["numbers"].apply(tester)
print(df)

This yields

            numbers  in_range
0                10     False
1               100     False
2               200     False
3             5,000     False
4            10,000      True
5             15000      True
6  some weird stuff     False
7       160,000,000     False

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522292

One approach might be to use the generic regex for thousands, and then add a lookahead to restrict the lengths to the range you want:

^(?=.{6,11}$)\d{1,3}(?:,\d{3})*$

Demo

Upvotes: 3

Related Questions