Reputation: 1971
Set up
I'm extracting hrefs from a page using the following xpath,
'/html/body/div/div[2]/div[2]/div/div/p[1]/a/@href'
which gives me a list of hrefs looking like,
['#',
'showv2.php?p=Glasgow City&t=Anderston',
'showv2.php?p=Glasgow City&t=Anniesland',
'showv2.php?p=Glasgow City&t=Ashfield',
'#',
'showv2.php?p=Glasgow City&t=Baillieston',
⋮
'showv2.php?p=Glasgow City&t=Yoker']
I'm not interested in the '#'
hrefs. All the hrefs I am interested in contain Glasgow
. How do I select only the hrefs containing Glasgow
?
I've seen answers regarding regex with 'id'
etc, but not with href. Those answers do not seem to work with href.
I've seen answers regarding regex with beginning or ending of a href, but I'd like to be able to regex on 'containing' a word.
Upvotes: 1
Views: 1132
Reputation: 627082
Use contains(@href, 'Glasgow')
"restriction" on the a
elements:
'/html/body/div/div[2]/div[2]/div/div/p[1]/a[contains(@href, "Glasgow")]/@href'
Then, it will only find those <a>
s under the specified path that contain Glasgow
inside their href
attribute values.
Upvotes: 3