whiskycat
whiskycat

Reputation: 11

Xpath return tag position (which contains text) in a set

i would like to return the position number of the first tag that contains a choosen text in a set of tags.

I want to set up a google sheet with 4 columns :

1 : the URL (https://www.castorama.fr/electricite/interrupteur-et-prise/interrupteur-et-prise-etanche/cat_id_4414.cat)

2 : the number of products on the category page :

    =IMPORTXML($B4;"count(//p[@data-test-id='productTitle'])")

3 : The number of products which its title contains "Legrand" or "Céliane with Netatmo" :

    =IMPORTXML($B4;"count(//p[@data-test-id='productTitle'][contains(.,'Céliane with Netatmo')or contains(.,'Legrand')])")

The above formulas are good. I'm having a hard time on the fourth one :

4 : The position of the first product which its title contains "Legrand" or "Céliane with Netatmo".

For example, column 2 returns 24 (products), and column 3 returns 12 (products). If the first product is a product with "Legrand" or "Céliane with Netatmo", i want the column 4 to return 1. If it is in 4th position, i want it to return 4.

i tried

    =IMPORTXML($B4;"//div//p[@data-test-id='productTitle'][.='Legrand']/preceding-subling::*)+1")

but it seems that i don't have the good order for the "//div//p". i bet it have to do with the subling elements, before the "/" , i must not have the correct selected tags.

Upvotes: 0

Views: 67

Answers (2)

E.Wiest
E.Wiest

Reputation: 5915

Another option with the following XPath (we use () [] and preceding axis) :

=IMPORTXML(B4;"count((//p[@data-test-id='productTitle'][contains(.,'Céliane with Netatmo')or contains(.,'Legrand')])[1]/preceding::div[@data-test-id='product-panel'])+1")

Output :

LG

For safety reason (if there's no Legrand product on the page), formula in E4 should be :

=IF(D4=0;"NA";IMPORTXML(B4;"count((//p[@data-test-id='productTitle'][contains(.,'Céliane with Netatmo')or contains(.,'Legrand')])[1]/preceding::div[@data-test-id='product-panel'])+1"))

Castorama pour ré-u-ssir !

Upvotes: 0

Jack Fleeting
Jack Fleeting

Reputation: 24930

They really hid the data very deeply on that page....

Try (using "Céliane" for this example; obviously you can modify it further):

=IMPORTXML($B4,"count((//li[@class='b9bdc658'][.//p[@data-test-id='productTitle'][contains(.,'Céliane')]])[1]/preceding-sibling::*)+1")

Upvotes: 1

Related Questions