Reputation: 51
I am making a sheet to scrape zillow for urls. I have gathered several parts together to get what I need but I am having issues with 1 part.
Paste this into A1
https://www.zillow.com/homes/fsbo/Tulsa-OK-74136/90319_rid/36.079662,-95.899444,36.0415,-95.991712_rect/13_zm/0_mmm/
Paste this into B1
=importXML(A1,"//a/@href")
Then here come a long list of urls from this one page. But what I am looking for is below.
...
...
#
/homedetails/3435-E-64th-St-Tulsa-OK-74136/22113164_zpid/
/myzillow/UpdateFavorites.htm?zpid=22113164&operation=add&ajax=false
/homedetails/7747-S-Fulton-Pl-Tulsa-OK-74136/2092972797_zpid/
/myzillow/UpdateFavorites.htm?zpid=2092972797&operation=add&ajax=false
/homedetails/4324-E-67th-St-UNIT-676-Tulsa-OK-74136/22229329_zpid/
/myzillow/UpdateFavorites.htm?zpid=22229329&operation=add&ajax=false
/homedetails/7801-S-Louisville-Ave-Tulsa-OK-74136/22227172_zpid/
/myzillow/UpdateFavorites.htm?zpid=22227172&operation=add&ajax=false
/homedetails/1612-E-66th-St-Tulsa-OK-74136/22129877_zpid/
/myzillow/UpdateFavorites.htm?zpid=22129877&operation=add&ajax=false
/homedetails/5503-E-73rd-St-Tulsa-OK-74136/22145899_zpid/
/myzillow/UpdateFavorites.htm?zpid=22145899&operation=add&ajax=false
/homedetails/7401-S-Yale-Ave-Tulsa-OK-74136/2101861353_zpid/
/myzillow/UpdateFavorites.htm?zpid=2101861353&operation=add&ajax=false
/homedetails/6508-S-Troost-Ave-Tulsa-OK-74136/22129854_zpid/
/myzillow/UpdateFavorites.htm?zpid=22129854&operation=add&ajax=false
/homedetails/7829-S-Evanston-Ave-Tulsa-OK-74136/22227977_zpid/
/myzillow/UpdateFavorites.htm?zpid=22227977&operation=add&ajax=false
/homedetails/7531-S-Irvington-Ave-Tulsa-OK-74136/2096103489_zpid/
/myzillow/UpdateFavorites.htm?zpid=2096103489&operation=add&ajax=false
/homedetails/1104-E-61st-St-Tulsa-OK-74136/2093334302_zpid/
/myzillow/UpdateFavorites.htm?zpid=2093334302&operation=add&ajax=false
/homedetails/1339-E-67th-St-Tulsa-OK-74136/22114998_zpid/
/myzillow/UpdateFavorites.htm?zpid=22114998&operation=add&ajax=false
/homedetails/7919-S-Braden-Ave-Tulsa-OK-74136/22235368_zpid/
/myzillow/UpdateFavorites.htm?zpid=22235368&operation=add&ajax=false
/homedetails/7014-S-Birmingham-Ct-Tulsa-OK-74136/22168356_zpid/
/myzillow/UpdateFavorites.htm?zpid=22168356&operation=add&ajax=false
/homedetails/7733-S-Hudson-Ave-Tulsa-OK-74136/22236219_zpid/
/myzillow/UpdateFavorites.htm?zpid=22236219&operation=add&ajax=false
#saved-search-lightbox
I want all of the /homedetails/..._zpid/ It can be an array so they all line up in the same column, that would be fine. I am sure a REGEX would do it, but I am not finding what I need. Can any one help?
I have these in my sheet. I just can't get this one to work the way I want.
=ArrayFormula(IfError((QUERY(QUERY(IFERROR(IF({1,1,0},IF({1,0,0},INT((ROW($B:B)-1)/20),MOD(ROW($B:B)-1,20)),importXML($B:$B,"//meta[@property='og:zillow_fb:address']/@content | //meta[@property='product:price:amount']/@content| //div[@class='hdp-fact-ataglance-value'] | //span[@class='contact-badge Listing Agent'] "))),"select min(Col3) where Col3 <> '' group by Col1 pivot Col2",0),"offset 1",0)),""))
=If($B6:B="","",Transpose(importxml($B6:$B,"//span[@class='snl phone']")))
Upvotes: 0
Views: 1385
Reputation: 201643
How about the modification of XPath query for importXML()
? I understood that you want /homedetails/..._zpid/
from your question. Is this correct?
=importXML(A1,"//ul[@class='photo-cards']//a[@class='zsg-photo-card-overlay-link routable hdp-link routable mask hdp-link']/@href")
If you want links both /homedetails/...
and /myzillow/...
, please use =importXML(A1,"//ul[@class='photo-cards']//a/@href")
.
If I misunderstand your question, I'm sorry.
Upvotes: 2