server info
server info

Reputation: 1335

optional matching of multiple regex subpattern

I have a regex problem which bugs me and have no clue how to solve it.

I have an input field with a text and I like to extract certain values out of it. I would like to extract a title, description, a price and a special price.

Examples for the input:

The CoffeeScript pattern I'm using:

 pattern = ///
  ([^$]+)
  (#(.+?)#+)
  ([\$]\d+\. \d+)
  ([\%\$]\d+\. \d+)
  ///
  params = [title,description,oldPrice,newPrice]=input_txt.match(pattern)[1..4]

It does not work. It should work if I enter all values in the given sequence and I also have to provide a the asked substring.

What I would like to have is the ability to extract the sequments if the are provided (so optional) and no matter of the sequence... How can I extract optional sequences of an string... EDIT/// I provide some examples

exmp1:

Kindle #Amazon's ebook reader# $79.00

this should be extracted as

title:Kindle 
description: Amazon's ebook reader 
oldPrice:$79.00

exmp2:

Nike Sneaker's $109.00 %$89.00

this should be extracted as

title:Nikes Sneaker's 
oldPrice:$109.00 
newPrice:$89.00

exmp3:

$100.00 Just dance 3 #for XBox# 

this should be extracted to

title: Just dance 3 
description: for XBox 
oldPrice:$100.00

Any help would be great ...

Upvotes: 0

Views: 1141

Answers (2)

jfriend00
jfriend00

Reputation: 707426

You can use this code that looks for a removes each separate piece of the matches:

function extractParts(str) {
    var parts = {};

    function removePiece(re) {
        var result;
        var matches = str.match(re);
        if (matches) {
            result = matches[1];
            str = str.replace(re, "");
        }
        return(result);
    }

    // find and remove each piece we're looking for
    parts.description = removePiece(/#([^#]+)#/);        // #text#
    parts.oldPrice = removePiece(/[^%](\$\d+\.\d+)/);    // $4.56
    parts.newPrice = removePiece(/%(\$\d+\.\d+)/);       // %$3.78
    // fix up whitespace
    parts.title = str.replace(/\s+/g, " ").replace(/^\s+/, "").replace(/\s+$/, "");
    return(parts);
}

var pieces = extractParts("Kindle #Amazon's ebook reader# $79.00");

And, you can see a demo in action here: http://jsfiddle.net/jfriend00/d8NNr/.

Upvotes: 1

yankee
yankee

Reputation: 40810

The nature of regular grammars makes it hard to solve your problem. As a work around the simplest solution would be to just execute your regex 4 times:

  1. Match /#(.+?)#+/ and remove the result string (string replace) from the original
  2. Match /[\%\$]\d+. \d+/ and remove the result string from the original
  3. Match /[\$]\d+. \d+/ and... you get the pattern
  4. Now what remains in the original is the the title.

Upvotes: 4

Related Questions