Reputation: 37
I am trying to use the importxml function in google sheets scrape the MPN (Manufacturer part number) from a website for a list of products (about 2000).
The website: http://gun.deals/search/apachesolr_search/640832007756
The instructions said to use the importxml formula. The link is A1, then used inspector in chrome to copy the xpath. The result:
=IMPORTXML(A1,"//*[@id='content']/div[3]/div[2]/div/div/div/div/div/div[2]/dl/div[2]/dd/a")
(Instructions also said to change double quotes to single around content.)
The error is imported "content in empty" when running the formula. I have tried to make edits and I think the XPath is incorrect. Some say that you can't copy-paste from inspector as the instructions said.
One other thing I tried was =IMPORTHTML(A1, "list", 8)
. This returns UPC and MPN. But it puts the MPN on the second line.
I am new to this and have searched for the last couple of nights for a solution. any help would be much appreciated.
Also Bonus question. What would the formula be to scrape the UPC if we had the MPN? https://gun.deals/search/apachesolr_search/J941PSL9
Upvotes: 2
Views: 440
Reputation: 201388
If my understanding is correct, how about this sample formula? Please think of this as just one of several answers.
In this pattern, the value of "MPN" is retrieved from the value of "UPC".
=IMPORTXML("http://gun.deals/search/apachesolr_search/"&A2,"//li[contains(text(),'MPN')]/a")
//li[contains(text(),'MPN')]/a
.In this pattern, the value of "UPC" is retrieved from the value of "MPN".
=IMPORTXML("http://gun.deals/search/apachesolr_search/"&A2,"//li[contains(text(),'UPC')]/a")
//li[contains(text(),'UPC')]/a
.If I misunderstood your question and this was not the result you want, I apologize.
From your replying, I modified the formulas for using the following values.
Here is a list of UPC 787450038417 787450230576 661120974888 859462004015 82442306667 810237023013 798681538782 787450348196 604206120816
=IMPORTXML("http://gun.deals/search/apachesolr_search/"&A2,"//dd/a[../../dt[contains(text(),'UPC')]]|//dd/span[../../dt[contains(text(),'UPC')]]")
//dd/a[../../dt[contains(text(),'UPC')]]|//dd/span[../../dt[contains(text(),'UPC')]]
.=IMPORTXML("http://gun.deals/search/apachesolr_search/"&B2,"//dd/a")
//dd/a
.Upvotes: 2