Reputation: 113
Problem: I have a text file with names written in Russian. I take each name from the text file and form a request to Wikipidea with line from text file as page title. Then I want to take information about all existing images on this website.
Program:
with open('names-video.txt', "r", encoding='Windows-1251') as file:
for line in file.readlines():
print(line)
name = "_".join(line.split())
print(name)
html = urlopen(f'https://ru.wikipedia.org/wiki/{name}')
bs = BeautifulSoup(html, 'html.parser')
images = bs.findAll('img', {'src': re.compile('.jpg')})
print(images[0])
names-video.txt:
Алимпиев, Виктор Гелиевич
Андреев, Алексей Викторович (художник)
Баевер, Антонина
Булдаков, Алексей Александрович
Жестков, Максим Евгеньевич
Канис, Полина Владимировна
Мустафин, Денис Рафаилович
Преображенский, Кирилл Александрович
Селезнёв, Владимир Викторович
Сяйлев, Андрей Фёдорович
Шерстюк, Татьяна Александровна
Error message:
error from callback <bound method SocketHandler.handle_message of <amino.socket.SocketHandler object at 0x0000018B92600FA0>>: 'ascii' codec can't encode characters in position 10-17: ordinal not in range(128)
File "C:\Users\1\Desktop\ИНФА\pycharm\venv\lib\site-packages\websocket\_app.py", line 344, in _callback
callback(*args)
File "C:\Users\1\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\socket.py", line 80, in handle_message
self.client.handle_socket_message(data)
File "C:\Users\1\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\client.py", line 345, in handle_socket_message
return self.callbacks.resolve(data)
File "C:\Users\1\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\socket.py", line 204, in resolve
return self.methods.get(data["t"], self.default)(data)
File "C:\Users\1\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\socket.py", line 192, in _resolve_chat_message
return self.chat_methods.get(key, self.default)(data)
File "C:\Users\1\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\socket.py", line 221, in on_text_message
def on_text_message(self, data): self.call(getframe(0).f_code.co_name, objects.Event(data["o"]).Event)
File "C:\Users\1\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\socket.py", line 209, in call
handler(data)
File "C:\Users\1\Desktop\python-bots\music_bot\bot.py", line 56, in on_text_message
html = urlopen(f'https://ru.wikipedia.org/wiki/{name}')
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 214, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 517, in open
response = self._open(req, data)
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 534, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 494, in _call_chain
result = func(*args)
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 1385, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 1342, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1255, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1266, in _send_request
self.putrequest(method, url, **skips)
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1104, in putrequest
self._output(self._encode_request(request))
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1184, in _encode_request
return request.encode('ascii')
Question: For some reason the code breaks on urlopen()
. print(line)
and print(name)
work just fine. What can be the problem here? I've been trying to tackle this issue for quite a while and I will appreciate any solution, thanks in advance.
Upvotes: 0
Views: 70
Reputation: 10827
You'll need to percent encode the non-ASCII characters to make it a proper URI:
from urllib.parse import quote
...
name = "_".join(line.split())
# Percent encode the UTF-8 characters
name = quote(name)
print(name)
...
Upvotes: 1