PirateApp
PirateApp

Reputation: 6204

How do I use regex to pick the following price values?

I have a scenario where I am trying to pick the price values in Rs from strings in Javascript as follows

The price was Rs.1000
The price was Rs 1000
The price was Rs.1000 - 5000
The price was Rs.1000 - Rs.5000
The price was Rs.50,000
The price was Rs 1,25,000 - Rs 2,45,000

Now obviously, given the input with so much variety in it, its not a good idea to make a single very long cumbersome regex expression. Currently I have divided this task into 4 parts Part 1 // Extracts all Rs.1000 or Rs 1000

var regex = new RegExp(/\brs\W*?(\d{1,7})\b(?![,\d])/i)

Part 2 //Extracts all Rs.1000 - 2000 or Rs 1000 - Rs 2000 and any combinations of this

regex = new RegExp(/\brs\W*?(\d{1,7})\b(?![,\d])\s*?(?:-|to)\s*?(?:\brs\b\W*?)?(\d{1,7})\b(?![,\d])/i)

I need to capture the currency values like 1000 and 2000 to store and process it.

A few questions right off the bat, my array in JS has around 3000 items. I am stuck on Part 3 and 4 that involves commas. Is this the right way to go about it. How do I get the values in 1 stroke where commas are present

This Regex seems to capture both normal numbers and numbers with commas, and since I just want numeric values rather than have anything to do with where the commas are placed, \brs\W*?\d.,?.\d\b I am trying to work one step forward on this expression to include 1000 - 2000 types as well. Any ideas?

Upvotes: 3

Views: 146

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626747

You can use a regex for this task - you have a regular pattern used to find repeated patterns in a plain text, just create the pattern dynamically. There are 2 main blocks, one that will match the prices glued to other words (so that we could skip that text) and the other will capture the prices only in valid contexts.

The whole regex looks ugly and long:

/\Brs\W*(?:\d{1,7}(?:,\d+)*)\b(?:\s*(?:-|to)\s*(?:\brs\b\W*?)?(?:\d{1,7}(?:,\d+)*)\b)?|\brs\W*(\d{1,7}(?:,\d+)*)\b(?:\s*(?:-|to)\s*(?:\brs\b\W*?)?(\d{1,7}(?:,\d+)*)\b)?/gi

However, it is clear it consists of simple and easily editable building blocks:

  • (\\d{1,7}(?:,\\d+)*)\\b - the number part
  • rs\\W*${num}(?:\\s*(?:-|to)\\s*(?:\\brs\\b\\W*?)?${num})? - the price part

NOTE that the capturing groups are made non-capturing with .replace(/\((?!\?:)/g, '(?:') further in the RegExp constructor.

See the JS demo:

const num = "(\\d{1,7}(?:,\\d+)*)\\b";
const block = `rs\\W*${num}(?:\\s*(?:-|to)\\s*(?:\\brs\\b\\W*?)?${num})?`;
const regex = RegExp(`\\B${block.replace(/\((?!\?:)/g, '(?:')}|\\b${block}`, 'ig');
const str = `The price was Rs.1000
    The price was Rs 1000
    The price was Rs.1000 - 5000
    The price was Rs.1000 - Rs.5000
    The price was Rs.50,000
    The price was Rs 1,25,000 - Rs 2,45,000
    The price was dummytestRs 1,2665,000 - Rs 2,45,000`;
let m;
let result = [];
while ((m = regex.exec(str)) !== null) {
  if (m[2]) {
    result.push([m[1].replace(/,/g, ''), m[2]]);
  } else if (m[1]) {
    result.push([m[1].replace(/,/g, ''), ""]);
  }
}
document.body.innerHTML = "<pre>" + JSON.stringify(result, 0, 4) + "</pre>";

Upvotes: 3

Related Questions