Reputation: 2075
I've been building a web scraper in python 3 using the scrapy library and I'm running into a problem I don't understand. I've successfully scraped other tables using inspect element on the table to get the xpath variables. However, with this table, I am unable to figure out how to extract the data from the table. I am new to HTML but not new to programming, so please help me if I'm way off here.
An example of this web page would be: http://land.elpasoco.com/ResidentialBuilding.aspx?schd=5317443025&bldg=1
Inspecting the page and getting the xpath for the target table yields //*[@id="aspnetForm"]/table/tbody/tr[3]/td[1]/table/tbody/tr[1]/td/table/tbody/tr[3]/td/table
However, using this in a scrapy shell response.xpath(target).extract()
returns []
. Trying to target any individual cells also appears to provide the same null result. My intended result would be a dataframe or dictionary correlating something like {'Dwelling Units': 1, 'Year Built': 2010 ... }
Any help identifying where I'm going wrong would or how to get the data formatted as such would be appreciated. Thanks!
Upvotes: 0
Views: 1717
Reputation: 5451
import scrapy
class ResidentialRecordsSpider(scrapy.Spider):
name = "residential_records"
start_urls = [
'http://land.elpasoco.com/ResidentialBuilding.aspx?schd=5317443025&bldg=1',
]
def parse(self, response):
for record in response.xpath('//table[@width="90%"]//td'):
key = record.xpath('./strong/text()').extract_first(default='')
value = record.xpath('./text()').extract_first(default='')
yield { key: value }
Here you need to perform some data cleaning only
Upvotes: 1