Kenneth
Kenneth

Reputation: 28737

Regex with capturing groups

I'm trying to extract file names from a long text.

Given the following text:

Page source file:///somedir/subdir/subdir/mysource.html lorem ipsum more text
Lorem Ipsum ...
Lorem Ipsum Page source file:///anotherdir/sub/dir/anothersource.html

I want a list of all the file names:

mysource.html
anothersource.html

I've been trying to get it with the following regular expressions:

// this only gets the last one (because of the greedy .*)
Page source.*\/(.*\.html)

// This gets all occurrences, but the value in my capture group is the 
// complete path starting after the first occurrence of /
Page source.*?\/(.*?\.html)

How can I tell the regex engine to be non-greedy for the outside expression, but still greedy enough to go to the last / before the .html part?

Upvotes: 1

Views: 85

Answers (1)

Dmitry Egorov
Dmitry Egorov

Reputation: 9650

Page source.*?([^\/]+?\.html)

Demo: https://regex101.com/r/uX6fY2/2

Upvotes: 7

Related Questions