Reputation: 5477
Today I was working on Alexa API to get sites popularity rank using this code:
import urllib.request, sys, re
site = 'https://stackoverflow.com/questions/'
xml = urllib.request.urlopen('http://data.alexa.com/data?cli=10&dat=s&url=%s'%site).read()
try: rank = int(re.search(r'<POPULARITY[^>]*TEXT="(\d+)"', xml).groups()[0])
except: rank = -1
print('Your rank for %s is %d!\n' % (site, rank))
It was working perfectly, but suddenly it stopped!, I checked the API link manually:
http://data.alexa.com/data?cli=10&dat=s&url=https://stackoverflow.com/questions/
and it just returns a word "Okay" rather than a XML string .. What is the problem ?!
Upvotes: 2
Views: 7322
Reputation: 331
Alexa rank has moved to new place and now is offered through paid API - https://awis.alexa.com/developer-guide. Said that, it is not expensive -https://aws.amazon.com/marketplace/pp/B07Q71HJ3H
Upvotes: 0
Reputation: 61
This might be what you are looking for
from bs4 import BeautifulSoup
import urllib.request
url='wikipedia.com'
rank_str =BeautifulSoup(urllib.request.urlopen("https://www.alexa.com/minisiteinfo/" +url),'html.parser').table.a.get_text()
rank_int=int(rank_str.replace(',',''))
print(rank_int)
Upvotes: 6
Reputation: 1985
That okey means that the IP you are running the script from has been blacklisted by alexa.
If you run it from a different IP it will work. Having said that I have no idea what rate / limit will cause IPs to be blacklisted
Upvotes: 1
Reputation: 525
That link seems to work fine for me when I tried it in Chrome and in Postman. Are you saying that the regex is returning "Okay"?
Also the response from that link is not in JSON, it is XML. Instead of using a regex to parse XML I would suggest that you use the XML module
Edit: I just tried you code and it worked, although I needed to convert the response to string (it came in as a byte-like object) before passing it into the regex.
Upvotes: 0