Reputation: 715
I'm new to python and currently trying to achieve the following:
I want to check HTTP response status codes for multiple URLs in my input.csv file:
id url
1 https://www.google.com
2 https://www.example.com
3 https://www.testtesttest.com
...
and save results as an additional column 'status' flagging those URLs that are down or with some other issues in my output.csv file:
id url status
1 https://www.google.com All good!
2 https://www.example.com All good!
3 https://www.testt75esttest.com Down
...
so far I was trying the following, but unsuccessfully::
import requests
import pandas as pd
import requests.exceptions
df = pd.read_csv('path/to/my/input.csv')
urls = df.T.values.tolist()[1]
try:
r = requests.get(urls)
r.raise_for_status()
except (requests.exceptions.ConnectionError, requests.exceptions.Timeout):
print "Down"
except requests.exceptions.HTTPError:
print "4xx, 5xx"
else:
print "All good!"
not sure how I could get results for the above and save as a new column in the output.csv file:
df['status'] = #here the result
df.to_csv('path/to/my/output.csv', index=False)
Would someone be able to help with this? Thanks in advance!
Upvotes: 1
Views: 1130
Reputation: 16683
id url
1 https://www.google.com
2 https://www.example.com
3 https://www.testtesttest.com
Copy the above to clipboard. Then, run the below code. You need to loop through the urls and append the status to a list. Then, set the list as a new column.
import requests
import pandas as pd
import requests.exceptions
df = pd.read_clipboard()
df
urls = df['url'].tolist()
status = []
for url in urls:
try:
r = requests.get(url)
r.raise_for_status()
except (requests.exceptions.ConnectionError, requests.exceptions.Timeout):
status.append("Down")
except requests.exceptions.HTTPError:
status.append("4xx, 5xx")
else:
status.append("All good!")
df['status'] = status
df.to_csv('path/to/my/output.csv', index=False)
Upvotes: 2