Reputation: 163
how can i remove the [u'\n\n\n result here \n\n\n']
and get have a result as [u'result here']
only... I am using scrapy
def parse_items(self, response):
str = ""
hxs = HtmlXPathSelector(response)
for titles in titles:
item = CraigslistSampleItem()
item ["job_id"] = (id.select('text()').extract() #ok
items.append(item)
return(items)
end
can anyone help me?
Upvotes: 0
Views: 11212
Reputation: 20748
Alternative to using Python's .strip()
You can use XPath function normalize-space()
around your XPath expression that selects "job_id":
def parse_items(self, response):
hxs = HtmlXPathSelector(response)
for titles in titles:
item = CraigslistSampleItem()
item ["job_id"] = title.select('normalize-space(.//td[@scope="row"])').extract()[0].strip()
items.append(item)
return(items)
Note 1: the XPath expression I use is based on https://careers-cooperhealth.icims.com/jobs/search?ss=1&searchLocation=&searchCategory=&hashed=0
Note 2 on the answer using .strip()
: with id.select('text()').extract()[0].strip()
you get u'result here'
, not a list.
That may very well be what you need, but if you want to keep the list, as you asked to remove [u'\n\n\n result here \n\n\n']
and get have a result as [u'result here']
, you can use something like this, using Python's map()
:
item ["job_id"] = map(unicode.strip, id.select('text()').extract())
Upvotes: 5
Reputation: 116
id.select('text()').extract()
returns a list of string containing your text. You should either iterate over that list to strip each item or use slicing e.g your_list[0].strip() to perform striping white spaces. Strip method is actually associated with string data types.
def parse_items(self, response):
str = ""
hxs = HtmlXPathSelector(response)
for titles in titles:
item = CraigslistSampleItem()
item ["job_id"] = id.select('text()').extract()[0].strip() #this should work if #there is some string data available. otherwise it will give an index out of range error.
items.append(item)
return(items)
end
Upvotes: 4