Reputation: 4196
I want to check a string for a certain format and, if it matches, assign values to variables from certain parts of the string.
For example string format is 'num_{month}_{year}_10p'
and string is 'num_october_16_10p'
. I want to assign variable string parts ({month}
and {year}
) to variables. I dont know exact string format in advance so I wrote simple function:
function string(string, regexp, monthPart, yearPart) {
if(!(regexp instanceof RegExp) || !regexp.test(string)) {
return false;
}
// I know that delimiter is underscore
var parts = string.split('_');
return {
month: parts[month],
year: parts[year]
};
}
And use it like test('num_october_16_10p', /num_[a-z]{3,9}_[0-9]{2}_10p/, 1, 2);
generating regular expression depending on situation.
Is there a better way to do it? Using regexp only? And how support any string format (without certain delimiter \ split()
) at all?
Upvotes: 1
Views: 91
Reputation: 5462
It is covering all cases
Index 4 will be month and index 5 will be year.
const regex = /(_|(\w+|^))(_|^)(\w+)_(\d+)(_|$)/gm;
const str = `num_october_16_10p
util_time_october_17
october_17_10p_num`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Upvotes: 0
Reputation: 350715
This will work with any reasonable delimiter and order, but expects month names to be either the full English names or three-letter abbreviations. Years can be either 2-digit or 4-digit numbers. If a string contains more than one possible match, only the first one is regarded:
function extractDateParts(s) {
return {
month: (s.match(/([^a-z]|^)(jan(uary)?|feb(ruary?)|mar(ch?)|apr(il)?|may|june?|july?|aug(ust)?|sep(tember)?|oct(ober)?|nov(ember)?dec(ember)?)(?![a-z])/i) || [])[2],
year: +(s.match(/([^a-z0-9]|^)(\d\d(\d\d)?)(?![a-z0-9])/) || [])[2] || undefined
};
}
console.log(extractDateParts('num_october_16_10p'));
Upvotes: 1
Reputation: 615
You can use the same regular expression for matching and extraction of the "variable string parts" by using capturing groups. You can create a capturing group by using parentheses around the tokens you'd like to capture. You can modify your existing regex to match num_october_16_10p
like this: num_([a-z]{3,9})_([0-9]{2})_10p
. You can then use it with
import re
regex = re.compile(r'num_([a-z]{3,9})_([0-9]{2})_10p')
matches = regex.match('num_october_16_10p')
matches.group(0) # 'num_october_16_10p'
matches.group(1) # 'october'
matches.group(2) # '16'
matches.groups() # ('october', '16')
Since you seem to be generating the matching regex dynamically, you should be able to add capturing groups.
Upvotes: 1