Reputation: 886
I'm have made a program for scraping data from a website so far I have made a program using python and selenium and beautifulsoup and when it wants to scrape data from a page of that website like this one I have to click on a tab named "سابقه" (at the top and it's in torquoise color)then website uses Ajax request to get the data and after that I loop through the table and the table has more than one page so I have to click on numbers below the table and collect new data again. My problem is this method is really slow because I also have to collect data from 500 pages and every page contains 35 tables. Is there any other faster way to do this?Or maybe a way to fire Ajax request within my program and get the response.It would be better if the solution be in python.
Upvotes: 0
Views: 1244
Reputation: 142641
Tab uses JavaScript to get all data from url
http://members.tsetmc.com/tsev2/data/InstTradeHistory.aspx?i=9211775239375291&Top=999999&A=0
and later only change data in table. Other tabs use different urls but rest should be similar.
You can use requests
to get all at once
import requests
url = 'http://members.tsetmc.com/tsev2/data/InstTradeHistory.aspx?i=9211775239375291&Top=999999&A=0'
r = requests.get(url)
print(r.text[:50]) # first 50 chars
data = r.text.split(';')
print('number od days:', len(data))
for row in data: # data[:5]: # first 5 rows
row = row.split('@')
print('date:', row[0], '|', row[1:4]) # first 3 values
Result (small preview)
[email protected]@[email protected]@[email protected]@859.00
number od days: 1202
date: 20171213 | ['901.00', '863.00', '893.00']
date: 20171212 | ['859.00', '859.00', '859.00']
date: 20171211 | ['823.00', '782.00', '819.00']
date: 20171210 | ['796.00', '780.00', '784.00']
date: 20171209 | ['797.00', '781.00', '787.00']
...
BTW: you could do it also with standard module urllib.request
but server sends data compressed with gzip
so you would have to use module gzipFile
to manually uncompress it.
Or you could try to send request with header Accept-Encoding: deflate
to inform server that you need data uncompressed.
I don't know if url
always has the same values in arguments
i=9211775239375291&Top=999999&A=0
but value i
is also in page url
http://www.tsetmc.com/Loader.aspx?ParTree=151311&i=9211775239375291
Upvotes: 2