Reputation: 201
Currently running the below script which checks for errors within a long list of urls. This code first looks for unique urls within df['Final_URL'], tests each separate url and returns the status of that link url. When I run the below code I get the current output on my notebook which is fine. Now I would like to push the status code (e.g 200, 404, BAD etc) to a new column in my df called "Status" for all url which equal to the unique urls I got at the beginning of the code.
What's the best way to create the new column df['Status'] and since I want to export this to google sheets do you know if text color is preserved when updating cells using pygsheets?
Input code:
#get unique urls and check for errors
URLS = []
for unique_link in df['Final_URL'].unique():
URLS.append(unique_link)
try:
GREEN = '\033[92m'
YELLOW = '\033[93m'
RED = '\033[91m'
ENDC = '\033[0m'
def main():
while True:
print ("\nTesting URLs.", time.ctime())
checkUrls()
time.sleep(10) #Sleep 10 seconds
break
def checkUrls():
for url in URLS:
status = "N/A"
try:
#check if regex contains bet3.com
if re.search(".*bet3\.com.*", url):
status = checkUrl(url)
else:
status = "BAD"
except requests.exceptions.ConnectionError:
status = "DOWN"
printStatus(url, status)
#for x in df['Final_URL']:
# if x == url:
# df['Status'] = printStatus(status)
def checkUrl(url):
r = requests.get(url, timeout=5)
#print r.status_code
return str(r.status_code)
def printStatus(url, status):
color = GREEN
if status != "200":
color=RED
print (color+status+ENDC+' '+ url)
#
# Main app
#
if __name__ == '__main__':
main()
except:
print('Something went wrong!')
Current output:
200 https://www.bet3.com/dl/~offer
404 http://extra.bet3.com/promotions/en/soccer/soccer-accumulator-bonus
BAD https://extra.betting3.com/features/en/bet-builder
200 https://www.bet3.com/dl/6
Upvotes: 0
Views: 121
Reputation: 3297
You can just rewrite your function as such
def checkUrl(url):
if re.search(".*bet3\.com.*", url):
try:
r = requests.get(url, timeout=5)
except requests.exceptions.ConnectionError:
return 'DOWN'
return str(r.status_code)
return 'BAD'
and then apply it like this
df['Status'] = df['Final_URL'].apply(checkUrl)
although, as noticed by user32185, in case there's duplicate URLs, this will call them twice.
To avoid that you can do as user32185 suggested and write your function like so:
def checkUrls(urls):
results = []
for url in urls:
if re.search(".*bet3\.com.*", url):
try:
r = requests.get(url, timeout=5)
except requests.exceptions.ConnectionError:
results.append([url, 'DOWN'])
results.append([url, str(r.status_code)])
else:
results.append([url, 'BAD'])
return pd.DataFrame(data=results, columns=['Final_URL', 'Status'])
and then use it like this:
status_df = checkUrls(df['Final_URL'].unique())
df = df.merge(status_df, how='left', on='Final_URL')
Upvotes: 2