LucSpan
LucSpan

Reputation: 1971

Xpath: obtain href if contains specific word

Set up

I'm extracting hrefs from a page using the following xpath,

'/html/body/div/div[2]/div[2]/div/div/p[1]/a/@href'

which gives me a list of hrefs looking like,

['#',
 'showv2.php?p=Glasgow City&t=Anderston',
 'showv2.php?p=Glasgow City&t=Anniesland',
 'showv2.php?p=Glasgow City&t=Ashfield',
 '#',
 'showv2.php?p=Glasgow City&t=Baillieston',
           ⋮
'showv2.php?p=Glasgow City&t=Yoker']


Problem

I'm not interested in the '#' hrefs. All the hrefs I am interested in contain Glasgow. How do I select only the hrefs containing Glasgow?

I've seen answers regarding regex with 'id' etc, but not with href. Those answers do not seem to work with href.

I've seen answers regarding regex with beginning or ending of a href, but I'd like to be able to regex on 'containing' a word.

Upvotes: 1

Views: 1132

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

Use contains(@href, 'Glasgow') "restriction" on the a elements:

'/html/body/div/div[2]/div[2]/div/div/p[1]/a[contains(@href, "Glasgow")]/@href'

Then, it will only find those <a>s under the specified path that contain Glasgow inside their href attribute values.

Upvotes: 3

Related Questions