Regex with capturing groups

Question

I'm trying to extract file names from a long text.

The filenames are all in a path
The path is always prefixed with the text Page source
They can appear anywhere on a line
The text contains multiple lines
All filenames end with .html

Given the following text:

Page source file:///somedir/subdir/subdir/mysource.html lorem ipsum more text
Lorem Ipsum ...
Lorem Ipsum Page source file:///anotherdir/sub/dir/anothersource.html

I want a list of all the file names:

mysource.html
anothersource.html

I've been trying to get it with the following regular expressions:

// this only gets the last one (because of the greedy .*)
Page source.*\/(.*\.html)

// This gets all occurrences, but the value in my capture group is the 
// complete path starting after the first occurrence of /
Page source.*?\/(.*?\.html)

How can I tell the regex engine to be non-greedy for the outside expression, but still greedy enough to go to the last / before the .html part?

Regex with capturing groups

Answers (1)

Related Questions