Glyphack
Glyphack

Reputation: 886

scraping data from website which uses Ajax requests

I'm have made a program for scraping data from a website so far I have made a program using python and selenium and beautifulsoup and when it wants to scrape data from a page of that website like this one I have to click on a tab named "سابقه" (at the top and it's in torquoise color)then website uses Ajax request to get the data and after that I loop through the table and the table has more than one page so I have to click on numbers below the table and collect new data again. My problem is this method is really slow because I also have to collect data from 500 pages and every page contains 35 tables. Is there any other faster way to do this?Or maybe a way to fire Ajax request within my program and get the response.It would be better if the solution be in python.

Upvotes: 0

Views: 1244

Answers (1)

furas
furas

Reputation: 142641

Tab uses JavaScript to get all data from url

http://members.tsetmc.com/tsev2/data/InstTradeHistory.aspx?i=9211775239375291&Top=999999&A=0

and later only change data in table. Other tabs use different urls but rest should be similar.

You can use requests to get all at once

import requests

url = 'http://members.tsetmc.com/tsev2/data/InstTradeHistory.aspx?i=9211775239375291&Top=999999&A=0'

r = requests.get(url)

print(r.text[:50]) # first 50 chars

data = r.text.split(';')

print('number od days:', len(data))

for row in data: # data[:5]: # first 5 rows
    row = row.split('@')
    print('date:', row[0], '|', row[1:4]) # first 3 values

Result (small preview)

[email protected]@[email protected]@[email protected]@859.00

number od days: 1202

date: 20171213 | ['901.00', '863.00', '893.00']
date: 20171212 | ['859.00', '859.00', '859.00']
date: 20171211 | ['823.00', '782.00', '819.00']
date: 20171210 | ['796.00', '780.00', '784.00']
date: 20171209 | ['797.00', '781.00', '787.00']
...

BTW: you could do it also with standard module urllib.request but server sends data compressed with gzip so you would have to use module gzipFile to manually uncompress it.

Or you could try to send request with header Accept-Encoding: deflate to inform server that you need data uncompressed.


I don't know if url always has the same values in arguments

i=9211775239375291&Top=999999&A=0

but value i is also in page url

http://www.tsetmc.com/Loader.aspx?ParTree=151311&i=9211775239375291

Upvotes: 2

Related Questions