Omair Vaiyani
Omair Vaiyani

Reputation: 33

Regex - invalid expression error "Nothing to repeat at..."

I'm trying to build a regex to catch a useful part of my S3 filename uploads. I used a regex generator and so far I have this test (which results in an error thrown on javascript):

/[A-Za-z]++[^\.\w][^\.]++|(?<=_)\w++(?=\.)/g

Here are some example strings that I am working with (with the require pattern to match):

"MTxoZbRRUu9BfQLvAWwP_Bruntwood Leeds Digital Festival ad.pdf" // desired match "Bruntwood Leeds Digital Festival ad"

"bbZRU3329BfXXvvAWwP_short-video.mp4" // desired match "short-video"

"zQZFnWVcRUbFNGyGdIP0_MGI-Artificial-Intelligence-Discussion-slides.pptx" // desired match "MGI-Artificial-Intelligence-Discussion-slides"

If it helps - I need to run this regex test on javascript.

const filename = "bbZRU3329BfXXvvAWwP_short-video.mp4";
const match = filename.match(regex);
console.log(match); // "short-video"

Thank you!

Upvotes: 0

Views: 66

Answers (4)

The fourth bird
The fourth bird

Reputation: 163277

For these example strings you could split on a dot and an underscore [._]

That will give you an array with 3 parts. The values you are looking for are in the second part [1]:

const strings = [
  "MTxoZbRRUu9BfQLvAWwP_Bruntwood Leeds Digital Festival ad.pdf",
  "bbZRU3329BfXXvvAWwP_short-video.mp4",
  "zQZFnWVcRUbFNGyGdIP0_MGI-Artificial-Intelligence-Discussion-slides.pptx"
];

strings.forEach((s) => console.log(s.split(/[_.]/)[1]));

Upvotes: 0

revo
revo

Reputation: 48711

Don't use regex generators if they don't provide your end regex flavor as flavors syntax and features may differ from each other. You are basically doing this:

_[^.]+

with the only one difference that it matches preceding _ character too that you can work around it later in JS.

Live demo

var text = `MTxoZbRRUu9BfQLvAWwP_Bruntwood Leeds Digital Festival ad.pdf
bbZRU3329BfXXvvAWwP_short-video.mp4
zQZFnWVcRUbFNGyGdIP0_MGI-Artificial-Intelligence-Discussion-slides`;

console.log(
  text.match(/_[^.]+/g).map(v => v.substr(1))
);

Upvotes: 1

Jeto
Jeto

Reputation: 14927

Given your examples, you could use a much simpler regex:

const regex = /_([^.]+)/;

const inputs = [
  "MTxoZbRRUu9BfQLvAWwP_Bruntwood Leeds Digital Festival ad.pdf", // desired match "Bruntwood Leeds Digital Festival ad"
  "bbZRU3329BfXXvvAWwP_short-video.mp4", // desired match "short-video"
  "zQZFnWVcRUbFNGyGdIP0_MGI-Artificial-Intelligence-Discussion-slides.pptx" // desired match "MGI-Artificial-Intelligence-Discussion-slides"
];

for (const input of inputs) {
  const match = input.match(regex);
  console.log(match[1]);
}

Upvotes: 3

melpomene
melpomene

Reputation: 85767

I used a regex generator

But not for JavaScript regexes, it seems. Every tool and library has its own regex quirks. In particular, JS doesn't support possessive quantifiers like ++ (nor independent submatches in general, (?> )).

JS also does not support look-behind, (?<= ).

You could e.g. do this instead:

const strs = [
    "MTxoZbRRUu9BfQLvAWwP_Bruntwood Leeds Digital Festival ad.pdf",
    "bbZRU3329BfXXvvAWwP_short-video.mp4",
    "zQZFnWVcRUbFNGyGdIP0_MGI-Artificial-Intelligence-Discussion-slides.pptx",
];

for (const str of strs) {
    const m = /_([^.]+)\./.exec(str);
    if (!m) {
        console.log("no match: " + str);
        continue;
    }
    console.log("match: " + m[1]);
}

Upvotes: 2

Related Questions