Reputation: 89
I am trying to extract table data from this page. After navigating in network tool, I figured out that an api call could provide me the required table data so I tried to mimic request with python scrapy. Here is the code and response message.
In [27]: url
Out[27]: 'https://www.barchart.com/proxies/core-api/v1/quotes/get?symbol=MSFT&lists=stocks.inSector.all(-COSO)&fields=symbol,symbolName,weightedAlpha,lastPrice,priceChange,percentChange,highPrice1y,lowPrice1y,percentChange1y,tradeTime,symbolCode,symbolType,hasOptions&orderBy=weightedAlpha&orderDir=desc&meta=field.shortName,field.type,field.description&hasOptions=true&page=1&limit=100&raw=1'
In [28]: headers
Out[28]: {'X-XSRF-TOKEN': 'eyJpdiI6Ims2ZVJxT3pRRUplSCtLZXRVZXA3cXc9PSIsInZhbHVlIjoiaDJaQ0hhVWQwUU9zMEQ2S1FqVEVxR3hPYTJYRzd3d0VWWkZzMUhYQmRPSGVoaWVtTnBNUXZzdkJhTngvS2xNLyIsIm1hYyI6Ijc3MzY1N2M4ZDljMWQ4MDY4OTA5ZGQwNmUzYThiNDNkMDNlZDUyZmQ1Mjc4ZTU0MzkwMjA3ZDFmMDAwMTdkYTMifQ=='}
In [29]: fetch(scrapy.Request(url,headers=headers))
2021-03-03 12:12:55 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.barchart.com/proxies/core-api/v1/quotes/get?symbol=MSFT&lists=stocks.inSector.all(-COSO)&fields=symbol,symbolName,weightedAlpha,lastPrice,priceChange,percentChange,highPrice1y,lowPrice1y,percentChange1y,tradeTime,symbolCode,symbolType,hasOptions&orderBy=weightedAlpha&orderDir=desc&meta=field.shortName,field.type,field.description&hasOptions=true&page=1&limit=100&raw=1> (referer: None)
Is there anything I am missing in headers or something elsewhere?
Upvotes: 0
Views: 730
Reputation: 3730
When you visit https://www.barchart.com/stocks/quotes/MSFT/competitors you get get a repsponse header with set-cookie=larvel-token...
and some other cookies. I tried all cookies and laravel-token
is the one used for auth. You also need to x-xsrf-token that you've already extracted.
To solve your problem in Scrapy. First make sure you have cookies enabled in settings.py. Then send a request to: https://www.barchart.com/stocks/quotes/MSFT/competitors. In the parse method of that request there you send the next request to the url you sent above. Scrapy will then automatically handle the cookies.
Here's an example spider that worked for me (I extracted the xsrf token quite sloppy, you probably have a better way):
import re
from urllib.parse import unquote
import scrapy
class TestSpider(scrapy.Spider):
name='testspider'
def start_requests(self):
yield scrapy.Request(
url='https://www.barchart.com/stocks/quotes/MSFT/competitors',
)
def parse(self, response):
for set_cookie in response.headers.getlist('Set-Cookie'):
try:
xsrf_token = re.findall(r'XSRF-TOKEN=(\w+==);', unquote(set_cookie.decode('utf-8')))[0]
except IndexError:
pass
yield scrapy.Request(
url='https://www.barchart.com/proxies/core-api/v1/quotes/get?'\
'symbol=MSFT&lists=stocks.inSector.all(-COSO)&fields=symb'\
'ol,symbolName,weightedAlpha,lastPrice,priceChange,percen'\
'tChange,highPrice1y,lowPrice1y,percentChange1y,tradeTime'\
',symbolCode,symbolType,hasOptions&orderBy=weightedAlpha&'\
'orderDir=desc&meta=field.shortName,field.type,field.desc'\
'ription&hasOptions=true&page=1&limit=100&raw=1',
callback=self.parse_data,
headers={
'x-xsrf-token': xsrf_token
}
)
def parse_data(self, response):
pass
Output
2021-03-03 12:26:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.barchart.com/stocks/quotes/MSFT/competitors> (referer: None)
2021-03-03 12:26:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.barchart.com/proxies/core-api/v1/quotes/get?symbol=MSFT&lists=stocks.inSector.all(-COSO)&fields=symbol,symbolName,weightedAlpha,lastPrice,priceChange,percentChange,highPrice1y,lowPrice1y,percentChange1y,tradeTime,symbolCode,symbolType,hasOptions&orderBy=weightedAlpha&orderDir=desc&meta=field.shortName,field.type,field.description&hasOptions=true&page=1&limit=100&raw=1> (referer: https://www.barchart.com/stocks/quotes/MSFT/competitors)
Upvotes: 2