Reputation: 417
I want to get direct links of pdf from webpage, I tried this regex pattern but did not work with me:
href=.*\.pdf$
data to test:
<a class="btn btn-small pad-button" href="/Tests/English/english_2011_summer_A-Q_b.pdf">eng1</a><br>
<a href="english_2011_summer_A-Q_c.pdf">eng2</a>
Upvotes: 0
Views: 223
Reputation: 415
Try this.
use group 1 and get the exact value.
href="([^"]+\.pdf)"
DEMO:http://regex101.com/r/nR8gY4/1
Upvotes: 0
Reputation: 70722
The main problem is the end of string $
anchor, the href values are not at this position. I can only recommend using a parser of sort to extract these values and if you want to use regex, I propose something like the following.
href=(["'])([^"']+\.pdf)\1
The values that you want as the match result can be accessed by capturing group #2
Upvotes: 3
Reputation: 30985
You can use this regex.
href=".*?([\w-]+\.pdf)"
The idea of this regex is to look for all href
witch contains X.pdf
at the end.
Upvotes: 1