Reputation: 3066
I am using the scrapy shell to extract some text data. Here are the commands i gave in the scrapy shell:
>>> scrapy shell "http://jobs.parklandcareers.com/dallas/nursing/jobid6541851-nurse-resident-cardiopulmonary-icu-feb2015-nurse-residency-requires-contract-jobs"
>>> response.xpath('//*[@id="jobDesc"]/span[1]/text()')
[<Selector xpath='//*[@id="jobDesc"]/span[1]/text()' data=u'Dallas, TX'>]
>>> response.xpath('//*[@id="jobDesc"]/span[2]/p/text()[2]')
[<Selector xpath='//*[@id="jobDesc"]/span[2]/p/text()[2]' data=u'Responsible for attending assigned nursi'>]
>>> response.xpath('//*[@id="jobDesc"]/span[2]/p/text()[preceding-sibling::*="Education"][following-sibling::*="Certification"]')
[]
The third command is not returning any data. I was trying to extract data between 2 keywords in the command. Where am i wrong?
Upvotes: 1
Views: 802
Reputation: 474271
//*[@id="jobDesc"]/span[2]/p/text()
would return you a list of text nodes. You can filter the relevant nodes in Python. Here's how you can get the text between "Education/Experience:" and "Certification/Registration/Licensure:" text paragraphs:
>>> result = response.xpath('//*[@id="jobDesc"]/span[2]/p/text()').extract()
>>> start = result.index('Education/Experience:')
>>> end = result.index('Certification/Registration/Licensure:')
>>> print ''.join(result[start+1:end])
- Must be a graduate from an accredited school of Nursing.
UPD (regarding an additional question in comments):
>>> response.xpath('//*[@id="jobDesc"]/span[3]/text()').re('Job ID: (\d+)')
[u'143112']
Upvotes: 1
Reputation: 2824
Try:
substring-before(
substring-after('//*[@id="jobDesc"]/span[2]/p/text()', 'Education'), 'Certification')
Note: I couldn't test it.
The idea is that you cannot use preceding-sibling
and following-sibling
because you look in the same text node. You have to extract the text part that you want using substring-before()
and substring-after()
By combining those two functions, you select what is in between.
Upvotes: 0