Reputation: 433
I am using Scrapy, XPath, and Python to scrape a website. When I get the results, it has \r\n. A google search has yielded that I need to use normalize-space() on my XPath. When I do it, see below, it does not work.
item ['runs'] = stats.select((normalize-space('//tr[@class="cell1"]/td[3]/text()')[count])).extract()
I get a "Global name normalize is not defined error.
Any ideas?
Upvotes: 2
Views: 1506
Reputation: 8610
normalize-space
is a part of XPath, not Python. So there is no such a function in Python or some other libs. The right usage of it is like this (just for a sample):
stats.select('''//tr[normalize-space(td/text()) = 'User Name']''').extract()
Just for drop the whitespaces of a a string in python, you can use str methods. For example:
strip
will remove the leading and trailing whitespaces.
>>> '\r\n\rsample\r\n'.strip()
'sample'
Something like normalize-space
:
>>> ' '.join('\r\ns am \r\n ple\r\n'.split())
's am ple'
Upvotes: 7