Job scheduling for data scraping on Python

Question

I'm scraping (extracting) data from a certain website. The data contains two values that I need, namely (grid) frequency value and time.

The data on the website is being updated every second. I'd like to continuously save these values (append them) into a list or a tuple using python. To do that I tried using schedule library. The following job schedule commands run the data scraping function (socket_freq) every second.

import schedule
schedule.every(1).seconds.do(socket_freq)

while True:
    schedule.run_pending()

I'm facing two problems:

I don't know how to restrict the schedule to run during a chosen time interval. For example, i'd like to run it for 5 or 10 minutes. how do I define that? I mean how to I tell the schedule to stop after a certain time.
if I run this code and stop it after few seconds (using break), then I often get multiple entries, for example here is one result, where the first list[ ] in the tuple refers to the time value and the second list[ ] is the values of frequency:

out:

(['19:27:02','19:27:02','19:27:02','19:27:03','19:27:03','19:27:03','19:27:03','19:27:03','19:27:03','19:27:03','19:27:04','19:27:04','19:27:04', ...], 
['50.020','50.020','50.020','50.018','50.018','50.018','50.018','50.018','50.018','50.018','50.017','50.017','50.017'...])

As you can see, the time variable is entered (appended) multiple times, although I used a schedule that runs every 1 second. What i'd actually would expect to retrieve is:

out:

(['19:27:02','19:27:03','19:27:04'],['50.020','50.018','50.017'])

Does anybody know how to solve these problems?

Thanks!

(I'm using python 2.7.9)

Job scheduling for data scraping on Python

Answers (1)

Related Questions