Reputation: 28737
I'm trying to extract file names from a long text.
Page source
.html
Given the following text:
Page source file:///somedir/subdir/subdir/mysource.html lorem ipsum more text
Lorem Ipsum ...
Lorem Ipsum Page source file:///anotherdir/sub/dir/anothersource.html
I want a list of all the file names:
mysource.html
anothersource.html
I've been trying to get it with the following regular expressions:
// this only gets the last one (because of the greedy .*)
Page source.*\/(.*\.html)
// This gets all occurrences, but the value in my capture group is the
// complete path starting after the first occurrence of /
Page source.*?\/(.*?\.html)
How can I tell the regex engine to be non-greedy for the outside expression, but still greedy enough to go to the last /
before the .html
part?
Upvotes: 1
Views: 85
Reputation: 9650
Page source.*?([^\/]+?\.html)
Demo: https://regex101.com/r/uX6fY2/2
Upvotes: 7