chano
chano

Reputation: 163

remove white space using strip()

how can i remove the [u'\n\n\n result here \n\n\n'] and get have a result as [u'result here'] only... I am using scrapy

def parse_items(self, response):
  str = ""
  hxs = HtmlXPathSelector(response)

  for titles in titles:
      item = CraigslistSampleItem()
      item ["job_id"] = (id.select('text()').extract() #ok
      items.append(item)
  return(items)
end

can anyone help me?

Upvotes: 0

Views: 11212

Answers (2)

paul trmbrth
paul trmbrth

Reputation: 20748

Alternative to using Python's .strip()

You can use XPath function normalize-space() around your XPath expression that selects "job_id":

def parse_items(self, response):
    hxs = HtmlXPathSelector(response)

    for titles in titles:
        item = CraigslistSampleItem()
        item ["job_id"] = title.select('normalize-space(.//td[@scope="row"])').extract()[0].strip()
        items.append(item)
    return(items)

Note 1: the XPath expression I use is based on https://careers-cooperhealth.icims.com/jobs/search?ss=1&searchLocation=&searchCategory=&hashed=0

Note 2 on the answer using .strip(): with id.select('text()').extract()[0].strip() you get u'result here', not a list.

That may very well be what you need, but if you want to keep the list, as you asked to remove [u'\n\n\n result here \n\n\n'] and get have a result as [u'result here'], you can use something like this, using Python's map():

item ["job_id"] = map(unicode.strip, id.select('text()').extract())

Upvotes: 5

Ifzal.Ahmed
Ifzal.Ahmed

Reputation: 116

id.select('text()').extract() 

returns a list of string containing your text. You should either iterate over that list to strip each item or use slicing e.g your_list[0].strip() to perform striping white spaces. Strip method is actually associated with string data types.

def parse_items(self, response):
  str = ""
  hxs = HtmlXPathSelector(response)

  for titles in titles:
      item = CraigslistSampleItem()
      item ["job_id"] = id.select('text()').extract()[0].strip() #this should work if #there is some string data available. otherwise it will give an index out of range error.
      items.append(item)
  return(items)
end

Upvotes: 4

Related Questions