Reputation: 1093
i need to truncate portions of a string that are found between a regex pattern.
<=
to the given padding, leave the portion as it is.code:
// note that the 'x' characters below could be any characters, even spaces or line breaks.
const str = 'xxxxx<mark>foo</mark>xx<mark>bar</mark>xxxxxxxxx<mark>baz</mark>xxxxxxxx'
const truncateBetweenPattern = (str, pattern, padding=0, sep='...') => {
// code
}
const pattern = '(<mark>.+</mark>)' // (not sure if this is valid)
const result = truncateBetweenPattern(str, pattern, 3)
output:
result === '...xxx<mark>foo</mark>xx<mark>bar</mark>xxx...xxx<mark>baz</mark>xxx...'
Upvotes: 0
Views: 516
Reputation: 163457
If your environment supports a lookbehind assertion, you can account for the 4 different scenario's with capture groups and lookarounds.
In the code check if the group value exists, and based upon the group number return the right replacement.
You capture either <mark>....</mark>
in which case you just return the unmodified match.
For </mark>....</mark>
you do the replacement with the separator if the string length is greater than 2 times the padding.
For the part till the first occurrence of <mark>
or the part after the last occurrence, you do the replacement if the string length is greater than the padding.
See the regex capture group values.
const regex = /(<mark>[^]*?<\/mark>)|(?<=<\/mark>)([^]*?)(?=<mark>)|([^]*?)(?=<mark>)|(?<=<\/mark>)([^]*)/g;
const truncateBetweenPattern = (str, pattern, padding = 0, sep = '...') => {
if (padding <= 0) return str;
return str.replace(regex, (m, g1, g2, g3, g4) => {
if (g1) return g1;
else if (g2 && g2.length > padding * 2) {
return g2.slice(0, padding) + sep + g2.slice(-padding);
} else if (g3 && g3.length > padding) {
return sep + g3.slice(-padding);
} else if (g4 && g4.length > padding) {
return g4.slice(0, padding) + sep;
} else return m;
})
}
const strings = [
'xxxxx<mark>foo</mark>xx<mark>bar</mark>xxxxxxxxx<mark>baz</mark>xxxxxxxx',
'xxxxx<mark>foo</mark>xx<mark>bar</mark>xxxxxxx<mark>baz</mark>xxxxxxxx',
'xxxxx<mark>foo</mark>xx<mark>bar</mark>xxxxxx<mark>baz</mark>xxxxxxxx',
'xx<mark>foo</mark>xx<mark>bar</mark>xxxxxxxx<mark>baz</mark>xxxxxxxx',
'xx<mark>foo</mark>xx<mark>bar</mark>xxxxxxxxxxxxxxxxxxxx<mark>baz</mark>xx<mark>foo</mark>xx<mark>bar</mark>xxxxxxxxxxxxxxxxxxxx<mark>baz</mark>'
];
strings.forEach(s => console.log(truncateBetweenPattern(s, regex, 3)));
Upvotes: 1
Reputation: 350760
You could split the string by the pattern, also producing the matched pattern itself (by making the pattern a capture group). Then map each part. When it is a separating part (your mark
tag), it will have an odd index, and in that case just echo that part without change. If it is not a separating part, then map it using another regex that will match when a separator needs to be injected. Design three regexes for this purpose: one for the prefix, one for the postfix, and one for all other parts.
The case where the separator is not found at all, the original string is returned (boundary case).
Here is how that could be coded:
const truncateBetweenPattern = (str, pattern, padding=0, sep='...') => {
const re = [
RegExp(`^().+?(.{${padding}})$`, "s"),
RegExp(`^(.{${padding}}).+?(.{${padding}})$`, "s"),
RegExp(`^(.{${padding}}).+?()$`, "s")
];
const parts = str.split(RegExp(`(${pattern})`, "s"));
return parts.length < 2 ? str
: parts.map((part, i, {length}) =>
i % 2 ? part : part.replace(re[(i > 0) + (i == length - 1)], `$1${sep}$2`)
).join("");
}
const pattern = '<mark>.+?</mark>'; // Make "+" lazy
const str = 'xxxxx<mark>foo</mark>xx<mark>bar</mark>xxxxxxxxx<mark>baz</mark>xxxxxxxx'
const result = truncateBetweenPattern(str, pattern, 3);
console.log(result);
Upvotes: 2
Reputation: 77
The following code should do what you want:
const truncateBetweenPattern = (str, pattern, padding=0, sep='...') => {
const regex = new RegExp(pattern, 'g');
let result = '';
let match;
while ((match = regex.exec(str)) !== null) {
// Truncate the left portion of the match if needed.
const left = match.index;
if (left > padding) {
result += str.slice(0, left - padding) + sep;
} else {
result += str.slice(0, left);
}
// Truncate the middle portion of the match if needed.
const middle = match[0].length;
if (middle > padding * 2) {
result += str.slice(left, left + padding) + sep + str.slice(left + middle - padding);
} else {
result += match[0];
}
// Truncate the right portion of the match if needed.
const right = str.length - regex.lastIndex;
if (right > padding) {
result += sep + str.slice(regex.lastIndex, regex.lastIndex + padding);
} else {
result += str.slice(regex.lastIndex);
}
// Update the string for the next iteration.
str = str.slice(regex.lastIndex);
}
return result;
}
Here's an example of how to use the function:
const str = 'xxxxx<mark>foo</mark>xx<mark>bar</mark>xxxxxxxxx<mark>baz</mark>xxxxxxxx';
const pattern = '(<mark>.+</mark>)';
const result = truncateBetweenPattern(str, pattern, 3);
console.log(result);
This should output:
xxx<mark>foo</mark>x...<mark>ba...z</mark>...
Upvotes: 1