Reputation: 67
I try to run a script to extract data from a json file which is updated every 1 to 2 minutes. The basic concept is that the script execute the extraction procedure first and then sleep for 1 minutes and execute the extraction procedure again. it is infinite loop;
It worked fine for more than one month and stopped suddenly one day without any error message, I restarted it and it worked fine. However, after some days it stopped again for no reason.
I have no idea what's the problem and could just provide my script. below is the python file I wrote.
from requests.auth import HTTPBasicAuth
import sys
import requests
import re
import time
import datetime
import json
from CSVFileGen1 import csv_files_generator1
from CSVFileGen2 import csv_files_generator2
from CSVFileGen3 import csv_files_generator3
from CSVFileGen4 import csv_files_generator4
def passpara():
current_time = datetime.datetime.now()
current_time_string = current_time.strftime('%Y-%m-%d %H:%M:%S')
sys.path.append('C:\\semester3\\data_copy\\WAZE\\output_scripts\\TNtool')
FileLocation1 = 'C:\\semester3\\data_copy\\www\\output\\test1'
FileLocation2 = 'C:\\semester3\\data_copy\\www\\output\\test2'
FileLocation3 = 'C:\\semester3\\data_copy\\www\\output\\test3'
FileLocation4 = 'C:\\semester3\\data_copy\\www\\output\\test4'
try:
r1 = requests.get('https://www...=JSON')
json_text_no_lines1 = r1.text
csv_files_generator1(current_time, json_text_no_lines1, FileLocation1)
except requests.exceptions.RequestException as e:
print 'request1 error'
print e
try:
r2 = requests.get('https://www...=JSON')
json_text_no_lines2 = r2.text
csv_files_generator2(current_time, json_text_no_lines2, FileLocation2)
except requests.exceptions.RequestException as e:
print 'request2 error'
print e
try:
r3 = requests.get('https://www...=JSON')
json_text_no_lines3 = r3.text
csv_files_generator3(current_time, json_text_no_lines3, FileLocation3)
except requests.exceptions.RequestException as e:
print 'request3 error'
print e
try:
r4 = requests.get('https://www...JSON')
json_text_no_lines4 = r4.text
csv_files_generator4(current_time, json_text_no_lines4, FileLocation4)
except requests.exceptions.RequestException as e:
print 'request4 error'
print e
print current_time_string + ' Data Operated. '
while True:
passpara()
time.sleep(60)
Here is the CSVFileGen1 that the first script calls. This script parses the json file and saves the information to a csv file.
import json
import datetime
import time
import os.path
import sys
from datetime import datetime
from dateutil import tz
def meter_per_second_2_mile_per_hour(input_meter_per_second):
return input_meter_per_second * 2.23694
def csv_files_generator1(input_datetime, input_string, target_directory):
try:
real_json = json.loads(input_string)
#get updatetime string
updatetime_epoch = real_json['updateTime']
update_time = datetime.fromtimestamp(updatetime_epoch/1000)
updatetime_string = update_time.strftime('%Y%m%d%H%M%S')
file_name = update_time.strftime('%Y%m%d%H%M')
dir_name = update_time.strftime('%Y%m%d')
if not os.path.exists(target_directory + '\\' + dir_name):
os.makedirs(target_directory + '\\' + dir_name)
if not os.path.isfile(target_directory + '\\' + dir_name + '\\' + file_name):
......#some detailed information I delete it for simplicity
except ValueError, e:
print e
Upvotes: 1
Views: 1458
Reputation: 549
I believe that your question has already been answered regarding the reason behind why your script may fail, so I won't duplicate that answer.
However I will provide an alternative solution. Instead of having your script run for days on end, remove the infinite loop, and set it up to run every minute with task scheduler (Windows) or cron (Linux). This has a couple of immediate benefits:
Upvotes: 0
Reputation: 191
At first glance, I think it would be the sys.path becoming full (as litelite mentioned). I think you can safely move this block of code outside the function to prevent it from being run infinitely (only append to sys.path once):
sys.path.append('C:\\semester3\\data_copy\\WAZE\\output_scripts\\TNtool')
FileLocation1 = 'C:\\semester3\\data_copy\\www\\output\\test1'
FileLocation2 = 'C:\\semester3\\data_copy\\www\\output\\test2'
FileLocation3 = 'C:\\semester3\\data_copy\\www\\output\\test3'
FileLocation4 = 'C:\\semester3\\data_copy\\www\\output\\test4'
So, your code would look like:
sys.path.append('C:\\semester3\\data_copy\\WAZE\\output_scripts\\TNtool')
FileLocation1 = 'C:\\semester3\\data_copy\\www\\output\\test1'
FileLocation2 = 'C:\\semester3\\data_copy\\www\\output\\test2'
FileLocation3 = 'C:\\semester3\\data_copy\\www\\output\\test3'
FileLocation4 = 'C:\\semester3\\data_copy\\www\\output\\test4'
while True:
passpara()
time.sleep(60)
When I tried a program that infinitely appends to sys.path, my RAM was being used very heavily. You may want to look into the memory usage of your script as the Python script may be hanging since it doesn't have enough memory. After a few minutes of running this script, my Chrome window crashed due to Python using around 10 GB RAM (used all available RAM).
Please note that I did not have a time.sleep(). The results obtained after running it without any pauses for a few minutes might reflect those found when running it every 60 seconds for a month.
My program is as follows:
import sys
while True:
sys.path.append("C:\\semester3\\data_copy\\WAZE\\output_scripts\\TNtool")
Interesting note: A simple incrementing of a variable in a while loop does not rapidly use large amounts of RAM. This is mainly since the variable in question is being overwritten each time and does not take up extra memory. In your case, sys.path is a "list" and appending to it infinitely causes extra RAM to be used. Example program:
count = 0
while True:
count += 1
On the other hand, appending to a list heavily uses RAM, which is to be expected:
count = []
while True:
count.append(1)
Upvotes: 1