Saurabh Kumar
Saurabh Kumar

Reputation: 16671

extract specific data from string using regex pattern

I have data like following

  1. 12 x ATG 370 g, 12 x 720 ml, 1 Glas = 0.97, 1 kg = 2.03
  2. versch. Sorten, 2 x 250 g, 1 Packung = 1.-, 100 g = 0.40
  3. 2 x 950 g, 1 Packung = 4.98, 1 kg = 4.47, tiefgekühlt
  4. versch. Sorten, 2 x 500 g, 1 Packung = 0.65, 1 kg = 1.-
  5. 3,5 % Fett, 3 x 1 Liter, 1 Packung = 0.76, 1 Liter = 0.60
  6. Krönung Balance gemahlen oder Krönung Aroma ganze Kaffeebohnen, 500 g, 1 kg = 6.44
  7. versch. Sorten, 400 g, 1 kg = 5.60
  8. 400 g, versch. Sorten, 1 kg = 5.60

Expected Outcome

  1. 12 x 720 ml => { pack: 12, weight:720 , unit: ml }
  2. 2 x 250 g. => { pack: 2, weight:250 , unit: g }
  3. 2 x 950 g => { pack: 2, weight:950 , unit: g }
  4. 2 x 500 g => { pack: 2, weight:500 , unit: g }
  5. 3 x 1 Liter => { pack: 3, weight:1 , unit: Liter }
  6. 500 g => { pack: 1, weight:500 , unit: g }
  7. 400 g => { pack: 1, weight:400 , unit: g }
  8. 400 g => { pack: 1, weight:400 , unit: g }

I tried the following code

const re = /^(\d+x)?([\d,]+)([a-z]+)/gm;

str.split(",").forEach(v => {
   const value = v.replace(/\s/g, "")
   let arr = [...value.matchAll(re)];
   console.log(arr[0]);
})

Results of the input string using above code

  1. 12 x ATG 370 g, 12 x 720 ml, 1 Glas = 0.97, 1 kg = 2.03

["12x", undefined, "12", "x"] ["12x720ml", "12x", "720", "ml"] undefined ["1kg", undefined, "1", "kg"]

  1. versch. Sorten, 2 x 250 g, 1 Packung = 1.-, 100 g = 0.40

undefined ["2x250g", "2x", "250", "g"] undefined ["100g", undefined, "100", "g"]

and so on...

I am not able to figure out how to extract the desired data and if this is even possible since the occurrence of the required data is not positioned properly in the string.

EDIT ( NEW )

Wiktor Stribiżew solution works perfectly for the above cases.

New Requirement -

  1. 12 x ATG 370 g, 12 x 720 ml, 1 Glas = 0.97, 1 kg = 2.03
  2. versch. Sorten, 2 x 250 g, 1 Packung = 1.-, 100 g = 0.40
  3. 2 x 950 g, 1 Packung = 4.98, 1 kg = 4.47, tiefgekühlt
  4. versch. Sorten, 2 x 500 g, 1 Packung = 0.65, 1 kg = 1.-
  5. 3,5 % Fett, 3 x 1 Liter, 1 Packung = 0.76, 1 Liter = 0.60
  6. Krönung Balance gemahlen oder Krönung Aroma ganze Kaffeebohnen, 400 - 500 g, 1 kg = 6.44 ( Range )
  7. versch. Sorten, 400 g, 1 kg = 5.60
  8. 100 - 400 g, versch. Sorten, 1 kg = 5.60 ( Range )

Expected Outcome

  1. 12 x 720 ml => { pack: 12, minweight:720 , maxweight: 0, unit: ml }
  2. 2 x 250 g. => { pack: 2, minweight:250 , maxweight: 0, unit: g }
  3. 2 x 950 g => { pack: 2, minweight:950 , maxweight: 0, unit: g }
  4. 2 x 500 g => { pack: 2, minweight:500 , maxweight: 0, unit: g }
  5. 3 x 1 Liter => { pack: 3, minweight:1 , maxweight: 0, unit: Liter }
  6. 400 - 500 g => { pack: 1, minweight:400 , maxweight: 500, unit: g }
  7. 400 g => { pack: 1, minweight:400 , maxweight: 0, unit: g }
  8. 100 - 400 g => { pack: 1, minweight:100 , maxweight: 400, unit: g }

Upvotes: 2

Views: 101

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627083

You can use

const arr = ['12 x ATG 370 g, 12 x 720 ml, 1 Glas = 0.97, 1 kg = 2.03','versch. Sorten, 2 x 250 g, 1 Packung = 1.-, 100 g = 0.40','2 x 950 g, 1 Packung = 4.98, 1 kg = 4.47, tiefgekühlt','versch. Sorten, 2 x 500 g, 1 Packung = 0.65, 1 kg = 1.-','3,5 % Fett, 3 x 1 Liter, 1 Packung = 0.76, 1 Liter = 0.60','Krönung Balance gemahlen oder Krönung Aroma ganze Kaffeebohnen, 400 - 500 g, 1 kg = 6.44','versch. Sorten, 400 g, 1 kg = 5.60','100 - 400 g, versch. Sorten, 1 kg = 5.60'];
const re = /(?:,\s*|^)(?:(\d+)\s*x\s*)?(\d+(?:\s*-\s*\d+)?)\s*([a-zA-Z]+)(?:$|,)/;
arr.forEach( str => {
   let [_, pack, weight, unit] = str.match(re);
   pack = pack || 1;
   console.log(str, {'pack': pack, 'weight': weight, 'unit': unit});
})

The regex matches:

  • (?:,\s*|^) - either a comma followed with zero or more whitespaces or start of string
  • (?:(\d+)\s*x\s*)? - an optional sequence of
    • (\d+) - Capturing group 1 (pack): one or more digits
    • \s*x\s* - x enclosed with optional zero or more whitespaces
  • (\d+(?:\s*-\s*\d+)?) - Capturing group 2 (weight): one or more digits and an optional sequence of - enclosed with optional whitespaces and then one or more digits
  • \s* - zero or more whitespaces
  • ([a-zA-Z]+) - Capturing group 3 (unit): one or more letters
  • (?:$|,) - either end of string or a comma

See the regex demo.

Upvotes: 1

Related Questions