Jake.bake
Jake.bake

Reputation: 11

Pull infomation using xpath

Does anyone know how to get the date from this using scrapy?

'<a href="/realDonaldTrump/status/988856839893897222" class="tweet-timestamp js-permalink js-nav js-tooltip" title="12:06 PM - 24 Apr 2018" data-conversation-id="988856839893897222"><span class="_timestamp js-short-timestamp " data-aria-label-part="last" data-time="1524596817" data-time-ms="1524596817000" data-long-form="true">Apr 24</span></a>']'  

I obtained this text using

 response.xpath('//*[contains(@class,"tweet-timestamp js-permalink js-nav js-tooltip")]').extract()

I'm after the information after the "title=" I'm kinda new so if you could explain why it works even better, thanks.

Upvotes: 0

Views: 70

Answers (2)

LMC
LMC

Reputation: 12672

Get the date in milliseconds contained in @data-time attribute and parse it.

d=float(xpath("string(//a[contains(@class,'tweet-timestamp')]/span/@data-time)"))
datetime.datetime.fromtimestamp(d).strftime('%Y-%m-%d %H:%M:%S')

Output

'2018-04-24 16:06:57'

Upvotes: 1

SIM
SIM

Reputation: 22440

Try the below xpath to get the date you wish to parse. The date is within title attribute. When you wanna get the value stored within any attribute, you need to call it using it's key like dictionary. Having said that the key here is title and the value is 12:06 PM - 24 Apr 2018.

xpath("//a[contains(@class,'tweet-timestamp')]/@title").extract_first()

Output:

12:06 PM - 24 Apr 2018

Upvotes: 2

Related Questions