Reputation: 803
I'm pulling about 100,00 values from a spreadsheet and grab the first results to see if they are http or https. The scripts works fine (fine enough for my purposes) but I get a 503 error after about the 70th iteration of the loop.
Any thoughts/ideas/suggestions on how to get the amount of queries I need?
Code:
import pandas as pd
import re
import time
library_list = pd.read_csv("PLS_FY2014_AE_pupld14a.csv")
zero = 0
with_https = 0
for i in library_list['LIBNAME']:
for url in search(library_list['LIBNAME'][zero], num = 1, start = 0, stop = 1):
time.sleep(5)
zero += 1
print(zero)
if 'https' in url:
with_https += 1
Upvotes: 4
Views: 3188
Reputation: 63
I'm trying to do the same thing and I was getting the 503 error after 30-50 results. I ended up forcing the search to wait a random time between 30 and 60 seconds per search. I've read of others having the same issue, and they were saying that google limits bot searches to around 50 per hour. the code I used is
import os, arcpy, urllib, ssl, time, datetime, random, errno
from datetime import datetime
from arcpy import env
from distutils.dir_util import copy_tree
try:
from google import search
except ImportError:
print("No module named 'google' found")
from google import search
with arcpy.da.UpdateCursor(facilities, ["NAME", "Weblinks", "ADDRESSSTATECODE", "MP_TYPE"]) as rows:
for row in rows:
if row[1] is None:
if row[3] != "xxxxxx":
query = str(row[0])
print("The query will be " + query)
wt = random.uniform(30,60)
print("Script will wait " + str(wt) + " seconds before the next search.")
for j in search("recreation.gov " + query + ", " + str(row[2]), tld="co.in", num=1, stop=1, pause=wt):
row[1] = str(j)
rows.updateRow(row)
print(row[1])
time.sleep(5)
print("")
My script has been running for 7 days now non-stop with no more errors. It may be slow, but eventually it will get the job done. I'm doing about 18,000 searches with it this round.
Upvotes: 6