Joe
Joe

Reputation: 25

xPath and importXML busy

I created a year ago a code which allows to retrieve information via importXML and Xpath, but since a few weeks it does not work anymore, and I can not find the problem.

I would like to retrieve for example on the page the number of employees : https://www.societe.com/societe/patisserie-thomas-753249192.html (info in French : Tranche d’effectif)

For example, i would like to retrieve the info: 6 to 9 employees by making a regular expression on the word employee (salariés), which allows to then recover the workforce. And same for the other information (Adresse postale, SIREN, etc...)

I was able to make an XPath code //*[@id="search"]/div[1]/a/@href to get info in a table, but that's not working.

Here is the way in which i recuperate the info. CompagnieName is just a example, can be change with any compagnie. I think that the XPath line is not correct, but i cannot find what to change, problem with div or other...

enter image description here

Other picture :

enter image description here

And after the info should appear in the following form.

enter image description here

If you had a solution or changes that i can made, that would be of great help to me.

Thank a lot !

Upvotes: 0

Views: 41

Answers (2)

Krzysztof Dołęgowski
Krzysztof Dołęgowski

Reputation: 2660

I think I get it:

enter image description here

First: There was a wrong structure of searching url. Then I tried xpath: //div/div[@class='Card frame']/a/@href This lists all urls that are under class 'Card frame' Also I think you need to add + in each query url where you use more than one word. That's why I add this in resultat (B4) enter image description here

I am not sure if this is what you looked for but it returns all the urls of places in result.

I added my work as a new Sheet in your file.

Upvotes: 1

Krzysztof Dołęgowski
Krzysztof Dołęgowski

Reputation: 2660

It's difficult to work while you don't share your file.

You asked about number of employees for your example page: You can import this using:

=importxml("https://www.societe.com/societe/patisserie-thomas-753249192.html";"//div[@id='trancheeff-histo-description']")

It returns 5 to 9

You can find right id in Chrome developer tools (F12).

Upvotes: 1

Related Questions