Reputation: 25
I created a year ago a code which allows to retrieve information via importXML and Xpath, but since a few weeks it does not work anymore, and I can not find the problem.
I would like to retrieve for example on the page the number of employees : https://www.societe.com/societe/patisserie-thomas-753249192.html (info in French : Tranche d’effectif)
For example, i would like to retrieve the info: 6 to 9 employees by making a regular expression on the word employee (salariés), which allows to then recover the workforce. And same for the other information (Adresse postale, SIREN, etc...)
I was able to make an XPath code //*[@id="search"]/div[1]/a/@href
to get info in a table, but that's not working.
Here is the way in which i recuperate the info. CompagnieName is just a example, can be change with any compagnie. I think that the XPath line is not correct, but i cannot find what to change, problem with div or other...
Other picture :
And after the info should appear in the following form.
If you had a solution or changes that i can made, that would be of great help to me.
Thank a lot !
Upvotes: 0
Views: 41
Reputation: 2660
I think I get it:
First: There was a wrong structure of searching url.
Then I tried xpath: //div/div[@class='Card frame']/a/@href
This lists all urls that are under class 'Card frame'
Also I think you need to add + in each query url where you use more than one word.
That's why I add this in resultat (B4)
I am not sure if this is what you looked for but it returns all the urls of places in result.
I added my work as a new Sheet in your file.
Upvotes: 1
Reputation: 2660
It's difficult to work while you don't share your file.
You asked about number of employees for your example page: You can import this using:
=importxml("https://www.societe.com/societe/patisserie-thomas-753249192.html";"//div[@id='trancheeff-histo-description']")
It returns 5 to 9
You can find right id in Chrome developer tools (F12).
Upvotes: 1