Reputation: 4839
I am using python 2.7 with wikipedia package to retrieve the text from multiple random wikipedia pages as explained in the docs.
I use the following code
def get_random_pages_summary(pages = 0):
import wikipedia
page_names = [wikipedia.random(1) for i in range(pages)]
return [[p,wikipedia.page(p).summary] for p in page_names]
text = get_random_pages_summary(50)
and get the following error
File "/home/user/.local/lib/python2.7/site-packages/wikipedia/wikipedia.py", line 393, in __load raise DisambiguationError(getattr(self, 'title', page['title']), may_refer_to) wikipedia.exceptions.DisambiguationError: "Priuralsky" may refer to: Priuralsky District Priuralsky (rural locality)
what i am trying to do is to get the text. from random pages in Wikipedia, and I need it to be just regular text, without any markdown
I assume that the problem is getting a random name that has more than one option when searching for a Wikipedia page. when i use it to get one Wikipedia page. it works well.
Thanks
Upvotes: 1
Views: 4789
Reputation: 615
According to the document(http://wikipedia.readthedocs.io/en/latest/quickstart.html) the error will return multiple page candidates so you need to search that candidate again.
try:
wikipedia.summary("Priuralsky")
except wikipedia.exceptions.DisambiguationError as e:
for page_name in e.options:
print(page_name)
print(wikipedia.page(page_name).summary)
You can improve your code like this.
import wikipedia
def get_page_sumarries(page_name):
try:
return [[page_name, wikipedia.page(page_name).summary]]
except wikipedia.exceptions.DisambiguationError as e:
return [[p, wikipedia.page(p).summary] for p in e.options]
def get_random_pages_summary(pages=0):
ret = []
page_names = [wikipedia.random(1) for i in range(pages)]
for p in page_names:
for page_summary in get_page_sumarries(p):
ret.append(page_summary)
return ret
text = get_random_pages_summary(50)
Upvotes: 2
Reputation: 824
As you're doing it for random articles and with a Wikipedia API (not directly pulling the HTML with different tools) my suggestion would be to catch the DisambiguationError and re-random article in case this happens.
def random_page():
random = wikipedia.random(1)
try:
result = wikipedia.page(random).summary
except wikipedia.exceptions.DisambiguationError as e:
result = random_page()
return result
Upvotes: 4