scrapy python re statement

Question

I am learning about scrapy. I am using scrapy 0.20 that is why I am following this tutorial. http://doc.scrapy.org/en/0.20/intro/tutorial.html

I undrstood the concepts. However, I have one thing yet.

In this statement

sel.xpath('//title/text()').re('(\w+):')

the output is

[u'Computers', u'Programming', u'Languages', u'Python']

what is re('(\w+):') using for please?

this statement

sel.xpath('//title/text()').extract()

has this output:

[u'Open Directory - Computers: Programming: Languages: Python: Books']

why is the comma , added between the elements? Also, all the ':' are removed.

Moreover: is this a python pure syntax please?

e h · Accepted Answer

This is a regular expression (regex), and is a whole world unto itself.

(\w+): Will return any text that ends in a colon (but does not return the colon) Here is an example of how it works with the ":" getting removed

(\w+:) Will return any text that ends in a colon (and will also return the colon) Here is an example of how it works with the ":" staying in

Also, if you want to learn about regex, Codecademy has a good python course

Answers (2)