Reputation: 538
I am working on extracting image filenames linked in xmls that are linked like the following
<text>
![Image description](iuiFE240H-dM_2DAHpuRxt.jpg)
</text>
<text>
![Image description](9u0I7ExVD0bzSfRIyEiH.png)
</text>
<text>
![Image description]( 0eA0SaTj8d90aHrs72rC.jpg )
</text>
Notice how sometimes the image filename might start after a ( and sometimes after a whitespace. Images are jpg or png. Also notice in the first image that underscores and dashes are used in the file names. Any help on a regex for this would be much appreciated. I have coded a function that loops through the string version of the files to extract the images but it looks very messy.
Upvotes: 1
Views: 72
Reputation: 626738
A naive approach would be to get any non-whitespace chunk of text after ](
and optional whitespaces:
/]\(\s*(\S+)\s*\)/g
See the regex demo.
To make it more precise, add more contextual subpatterns, like
/!\[Image description]\(\s*(\S+)\s*\)/g
/]\(\s*([^\s)]+\.(?:jpe?g|png))\s*\)/gi
etc.
Details:
]\(
- matches ](
char sequence\s*
- 0+ whitespaces(\S+)
- 1+ non-whitespace characters\s*
- 0+ whitespaces\)
- a literal )
More details:
[^\s)]+
- matches 1 or more chars other than whitespaces and )
\.
- a dot(?:jpe?g|png)
- either jpg
, or jpeg
, or png
/i
- case insensitive matching is enabled/g
- global modifier is on to match multiple occurrences.var regex = /]\(\s*(\S+)\s*\)/g;
var str = `<text>
![Image description](iuiFE240H-dM_2DAHpuRxt.jpg)
</text>
<text>
![Image description](9u0I7ExVD0bzSfRIyEiH.png)
</text>
<text>
![Image description]( 0eA0SaTj8d90aHrs72rC.jpg )
</text>`;
var res = [];
while ((m = regex.exec(str)) !== null) {
res.push(m[1]);
}
console.log(res);
Upvotes: 1