user3536226
user3536226

Reputation: 41

Regular expression that finds prices

Regular expression to extract all prices in a text, where prices will use "," as decimal separator. There are no thousands separator and they will be followed by " UDS". For example:

1500 USD
9 USD
0,53 USD
12,01 USD

^[^0]\d+(\,)?[0-9]{0,2} USD

It works for:

1500 USD
12,01 USD

but it does not work for:

9 USD
0,53 USD

Upvotes: 1

Views: 239

Answers (3)

The fourth bird
The fourth bird

Reputation: 163277

In your pattern ^[^0]\d+(\,)?[0-9]{0,2} USD in this part ^[^0] the first ^ is an anchor asserting the start of the string.

The second ^ is at the start inside a character class and its meaning is different. It creates a negated character class meaning that it can not start with 0.

The following part (\,)?[0-9]{0,2} is an optional group to match a comma(note that you don't have to escape it) and 0-2 digits. This way a value like 1, would also match.

There is no language tagged, but if a positive lookahead and a negative lookbehind are supported you might use this pattern to extract prices in a text using word boundaries to prevent the digits and USD being part of a larger word. (?<!\S) asserts that what is directly on the left is not a non whitespace character.

If you want the whole match instead of only the prices, you can match USD instead of using the positive lookahead.

(?<!\S)\d+(?:,\d{1,2})?(?= USD\b)

Regex demo

Another option is to use a capturing group instead of a lookahead. (?:^|\s) asserts the start of the string or match a whitespace character.

(?:^|\s)(\d+(?:,\d{1,2})?) USD\b

Regex demo

Upvotes: 2

Slawomir Dziuba
Slawomir Dziuba

Reputation: 1325

In JavaScript

/^\d{1,}(,\d{2}){0,1} USD$/

    var regex = /^\d{1,}(,\d{2}){0,1} USD$/;
    // true result
    console.log(regex.test('9 USD'));
    console.log(regex.test('0,53 USD'));
    console.log(regex.test('12,01 USD'));
    console.log(regex.test('1500 USD'));
    // false result
    console.log(regex.test(' USD'));
    console.log(regex.test('0,5,3 USD'));
    console.log(regex.test('12,0124 USD'));
    console.log(regex.test('1s500 USD'));

OR sed in action:

% echo "1500 USD 9 USD 0,53 USD 12,01 USD" |sed  -E 's/[0-9]+(,[0-9][0-9]){0,1} USD/TRUE/g'
TRUE TRUE TRUE TRUE

option -E enables extended regular expressions

Upvotes: 2

Emma
Emma

Reputation: 27723

My guess is that this simple expression would return what we might want:

([0-9,.]+)

regardless of other text contents that we might have, since validation is not required here, assuming that our prices are valid.

Demo 1

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Test

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        string pattern = @"([0-9,.]+)";
        string input = @"500 USD 9 USD 0,53 USD 12,01 USD
1500 USD 12,01 USD 9 USD 0,53 USD  1500 USD 12,01 USD 9 USD 0,53 USD ";
        RegexOptions options = RegexOptions.Multiline;

        foreach (Match m in Regex.Matches(input, pattern, options))
        {
            Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
        }
    }
}

Demo

const regex = /([0-9,.]+)/gm;
const str = `500 USD 9 USD 0,53 USD 12,01 USD
1500 USD 12,01 USD 9 USD 0,53 USD  1500 USD 12,01 USD 9 USD 0,53 USD `;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

Upvotes: -1

Related Questions