Arlen Gregory
Arlen Gregory

Reputation: 51

Google sheets Regex combined with importxml to limit what i get

I am making a sheet to scrape zillow for urls. I have gathered several parts together to get what I need but I am having issues with 1 part.

Paste this into A1

https://www.zillow.com/homes/fsbo/Tulsa-OK-74136/90319_rid/36.079662,-95.899444,36.0415,-95.991712_rect/13_zm/0_mmm/

Paste this into B1

=importXML(A1,"//a/@href")

Then here come a long list of urls from this one page. But what I am looking for is below.

...
...
#
/homedetails/3435-E-64th-St-Tulsa-OK-74136/22113164_zpid/
/myzillow/UpdateFavorites.htm?zpid=22113164&operation=add&ajax=false
/homedetails/7747-S-Fulton-Pl-Tulsa-OK-74136/2092972797_zpid/
/myzillow/UpdateFavorites.htm?zpid=2092972797&operation=add&ajax=false
/homedetails/4324-E-67th-St-UNIT-676-Tulsa-OK-74136/22229329_zpid/
/myzillow/UpdateFavorites.htm?zpid=22229329&operation=add&ajax=false
/homedetails/7801-S-Louisville-Ave-Tulsa-OK-74136/22227172_zpid/
/myzillow/UpdateFavorites.htm?zpid=22227172&operation=add&ajax=false
/homedetails/1612-E-66th-St-Tulsa-OK-74136/22129877_zpid/
/myzillow/UpdateFavorites.htm?zpid=22129877&operation=add&ajax=false
/homedetails/5503-E-73rd-St-Tulsa-OK-74136/22145899_zpid/
/myzillow/UpdateFavorites.htm?zpid=22145899&operation=add&ajax=false
/homedetails/7401-S-Yale-Ave-Tulsa-OK-74136/2101861353_zpid/
/myzillow/UpdateFavorites.htm?zpid=2101861353&operation=add&ajax=false
/homedetails/6508-S-Troost-Ave-Tulsa-OK-74136/22129854_zpid/
/myzillow/UpdateFavorites.htm?zpid=22129854&operation=add&ajax=false
/homedetails/7829-S-Evanston-Ave-Tulsa-OK-74136/22227977_zpid/
/myzillow/UpdateFavorites.htm?zpid=22227977&operation=add&ajax=false
/homedetails/7531-S-Irvington-Ave-Tulsa-OK-74136/2096103489_zpid/
/myzillow/UpdateFavorites.htm?zpid=2096103489&operation=add&ajax=false
/homedetails/1104-E-61st-St-Tulsa-OK-74136/2093334302_zpid/
/myzillow/UpdateFavorites.htm?zpid=2093334302&operation=add&ajax=false
/homedetails/1339-E-67th-St-Tulsa-OK-74136/22114998_zpid/
/myzillow/UpdateFavorites.htm?zpid=22114998&operation=add&ajax=false
/homedetails/7919-S-Braden-Ave-Tulsa-OK-74136/22235368_zpid/
/myzillow/UpdateFavorites.htm?zpid=22235368&operation=add&ajax=false
/homedetails/7014-S-Birmingham-Ct-Tulsa-OK-74136/22168356_zpid/
/myzillow/UpdateFavorites.htm?zpid=22168356&operation=add&ajax=false
/homedetails/7733-S-Hudson-Ave-Tulsa-OK-74136/22236219_zpid/
/myzillow/UpdateFavorites.htm?zpid=22236219&operation=add&ajax=false
#saved-search-lightbox

I want all of the /homedetails/..._zpid/ It can be an array so they all line up in the same column, that would be fine. I am sure a REGEX would do it, but I am not finding what I need. Can any one help?

I have these in my sheet. I just can't get this one to work the way I want.

=ArrayFormula(IfError((QUERY(QUERY(IFERROR(IF({1,1,0},IF({1,0,0},INT((ROW($B:B)-1)/20),MOD(ROW($B:B)-1,20)),importXML($B:$B,"//meta[@property='og:zillow_fb:address']/@content | //meta[@property='product:price:amount']/@content| //div[@class='hdp-fact-ataglance-value'] | //span[@class='contact-badge Listing Agent'] "))),"select min(Col3) where Col3 <> '' group by Col1 pivot Col2",0),"offset 1",0)),""))
=If($B6:B="","",Transpose(importxml($B6:$B,"//span[@class='snl phone']")))

Upvotes: 0

Views: 1385

Answers (1)

Tanaike
Tanaike

Reputation: 201643

How about the modification of XPath query for importXML()? I understood that you want /homedetails/..._zpid/ from your question. Is this correct?

Modified XPath query :

=importXML(A1,"//ul[@class='photo-cards']//a[@class='zsg-photo-card-overlay-link routable hdp-link routable mask hdp-link']/@href")

Result :

enter image description here

Note :

If you want links both /homedetails/... and /myzillow/..., please use =importXML(A1,"//ul[@class='photo-cards']//a/@href").

If I misunderstand your question, I'm sorry.

Upvotes: 2

Related Questions