Sergey Novikov
Sergey Novikov

Reputation: 4196

Get variable parts of string with known format

I want to check a string for a certain format and, if it matches, assign values to variables from certain parts of the string.

For example string format is 'num_{month}_{year}_10p' and string is 'num_october_16_10p'. I want to assign variable string parts ({month} and {year}) to variables. I dont know exact string format in advance so I wrote simple function:

function string(string, regexp, monthPart, yearPart) {
    if(!(regexp instanceof RegExp) || !regexp.test(string)) {
        return false;
    }

    // I know that delimiter is underscore
    var parts = string.split('_');

    return {
        month: parts[month],
        year: parts[year]
    };
}

And use it like test('num_october_16_10p', /num_[a-z]{3,9}_[0-9]{2}_10p/, 1, 2); generating regular expression depending on situation.

Is there a better way to do it? Using regexp only? And how support any string format (without certain delimiter \ split()) at all?

Upvotes: 1

Views: 91

Answers (3)

Ahmet Can Güven
Ahmet Can Güven

Reputation: 5462

It is covering all cases

  • num_october_16_10p
  • util_time_october_17
  • october_17_10p_num

Index 4 will be month and index 5 will be year.

const regex = /(_|(\w+|^))(_|^)(\w+)_(\d+)(_|$)/gm;
const str = `num_october_16_10p
util_time_october_17
october_17_10p_num`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

Upvotes: 0

trincot
trincot

Reputation: 350715

This will work with any reasonable delimiter and order, but expects month names to be either the full English names or three-letter abbreviations. Years can be either 2-digit or 4-digit numbers. If a string contains more than one possible match, only the first one is regarded:

function extractDateParts(s) {
    return {
        month: (s.match(/([^a-z]|^)(jan(uary)?|feb(ruary?)|mar(ch?)|apr(il)?|may|june?|july?|aug(ust)?|sep(tember)?|oct(ober)?|nov(ember)?dec(ember)?)(?![a-z])/i) || [])[2],
        year: +(s.match(/([^a-z0-9]|^)(\d\d(\d\d)?)(?![a-z0-9])/) || [])[2] || undefined
    };
}

console.log(extractDateParts('num_october_16_10p'));

Upvotes: 1

Ari Lotter
Ari Lotter

Reputation: 615

You can use the same regular expression for matching and extraction of the "variable string parts" by using capturing groups. You can create a capturing group by using parentheses around the tokens you'd like to capture. You can modify your existing regex to match num_october_16_10p like this: num_([a-z]{3,9})_([0-9]{2})_10p. You can then use it with

import re
regex = re.compile(r'num_([a-z]{3,9})_([0-9]{2})_10p')
matches = regex.match('num_october_16_10p')
matches.group(0) # 'num_october_16_10p'
matches.group(1) # 'october'
matches.group(2) # '16'
matches.groups() # ('october', '16')

Since you seem to be generating the matching regex dynamically, you should be able to add capturing groups.

Upvotes: 1

Related Questions