Enrico Mendiola
Enrico Mendiola

Reputation: 131

How to scrape data using xpath contains?

How can i exclude element to be scraped using contains with OR my current xpath that i use is not working. //div/li[contains(text(), 'Night') OR contains(text(), 'Big')

Upvotes: 0

Views: 138

Answers (3)

E.Wiest
E.Wiest

Reputation: 5905

To complete @Sergii Dmytrenko's answer, use also a lowercase or operator.

//div/li[contains(text(), 'Night') or contains(text(), 'Big')]

The preceding XPath will output li elements containing the text "Night" or "Big" (case sensitive).

In order to exclude elements, you can use the not operator as previoulsy described.

Side note : using != (not equal) with and operator is also possible to exclude elements :

//div/li[text()!='Night' and text()!='Big']

This will exclude elements which strictly contain (no more text) "Night" or "Big".


EDIT : Assuming you have :

<div>
  <h2>Night of the living dead</h2>
  <h2>Big fish</h2>
  <h2>Save the last dance</h2>
  <h2>Tomorrow never die</h2>
  <h2>Australia nuclear war</h2>
</div>

To select elements which don't contain "Night","Big", or "Australia", you have two options :

Using or operators inside a not condition :

//div/h2[not(contains(text(),'Night') or contains(text(),'Big') or contains(text(),'Australia'))]

Using multiple not with and operators :

//div/h2[not(contains(text(),'Night')) and not(contains(text(),'Big')) and not(contains(text(),'Australia'))]

Output : 2 nodes :

Save the last dance
Tomorrow never die

Upvotes: 1

sspsujit
sspsujit

Reputation: 301

Your XPath expression (if corrected the typos: li[contains(text(), 'Night') or contains(text(), 'Big')]) will return li elements having the text "Night" or "Big".

to exclude these the correct expression should be

//div/li[not(contains(text(), 'Night') or contains(text(), 'Big'))]

or you may try

//div/li[not(contains(text(), 'Night')) and not(contains(text(), 'Big'))]

Upvotes: 1

Sergii Dmytrenko
Sergii Dmytrenko

Reputation: 176

  1. Your xpath should end with ']', currently it is invalid one.

  2. If you would like to exclude 'Night' and 'Big' you may try this:

    //div/li[not(contains(text(), 'Night') OR contains(text(), 'Big'))]

Upvotes: 0

Related Questions