SC4RECROW
SC4RECROW

Reputation: 151

Retrieving only XML Tag Names in Scrapy

The Short:

How can I retrieve only tag names with .xpath() in Scrapy?

The Long:

I am currently using a scrapy.Spider and using response.selector.remove_namespaces() in the parse() function to keep things simple.

I am trying to do something like this, but with Scrapy:

Iterate on XML tags and get elements' xpath in Python

However, I can't seem to figure out how to retrieve only the name of the tags. What is the .xpath() command to grab just the tag names?

Upvotes: 1

Views: 200

Answers (1)

Alexander
Alexander

Reputation: 17291

There is no built in way of extracting just the tag name from a scrapy.selector class, at least that I am aware of.

That being said, you can use the re method of any selector and use a regular expression pattern to extract the tag name.

For example:

for selector in response.xpath("//*"):
    print(selector.re(r'<(\w+)\s'))

Upvotes: 1

Related Questions