ConnorU
ConnorU

Reputation: 1409

Scrapy: Extract value from meta tag

I'm building a crawler for a webpage that for some reason decided to hold ID numbers for the items I'm extracting in meta tags as such

<meta content="1001662613">

where the number in quotation marks is the number I want.

I tried using the xpath

Id = title.select('//meta [@content]').extract()

But results for that come out empty. Using

Id = title.select('//meta/@content').extract()

in turn give me the entire page's source code after the meta tag...

Is there any way to extract the number from the tag itelf, instead of trying to go into the tag (which is empty)?

For reference, here's an example of the section of the page's source where the ID number is located

<link rel="stylesheet" type="text/css" href="/ccss/2076d1c6bea75c5b6f4c753b3b4920b6_14bfe2d5b91d791bc05282634acdfb68.css" />
<script type="text/javascript" src="/cjs/986570aebf4e6cef6e0a52faa9c5a8a2_f4ceae6565fa007f39ee4e0abe02ab7b.js"></script>
<script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jqueryui/1.8.16/jquery-ui.min.js"></script>
<script type="text/javascript" src="/cjs/a373b58f85b5e68c60f3edc35b348e14_a2abaa7837c3e1ccda94d6fe6b0f7a8f.js"></script>
<meta content="1001657519"/>
<link href="http://www.groupon.com.uy/descuentos/montevideo/sushi-go-26-12-7" rel="canonical" />
<link href="http://www.groupon.com.uy/deals/feed.rss" type="application/rss+xml" rel="alternate" title="Groupon - Descuentos" />
<meta name="title" content="Desde $264 en vez de $462 por 24, 48 o 72 piezas de sushi en Sushi Go"/>

Upvotes: 3

Views: 6181

Answers (1)

alecxe
alecxe

Reputation: 474191

//meta/@content returns multiple results because of multiple meta tags on the page. Just filter the one that contains digits:

ids = title.select('//meta/@content').extract()
print [id for id in ids if id.isdigit()]

Hoep that helps.

Upvotes: 2

Related Questions