Reputation: 41
I am making a python script, and I want it to get an Arabin text from the site, but when I use requests to get the text all I get is:
æóæÇÌóÒóÚÇð áóæ ßÇäó áöáäóÝÓö ãóÌÒóÚõ
instead of this:
اذا ما مَشَت نادى بما في ثِيابها ذكِيُّ الشذا والمَندَليّ المطَيَّرُ
I tried the same code on different site that also uses Arabic, and the code worked perfectly and grabbed the Arabic text without any problems
from bs4 import BeautifulSoup
import requests
a = requests.get("https://www.aldiwan.net/poem30.html")
a = a.text
Upvotes: 2
Views: 911
Reputation: 1
try this
from bs4 import BeautifulSoup import requests
a = requests.get("https://www.aldiwan.net/poem30.html") a = a.content.decode('utf-8')
Upvotes: 0
Reputation: 1470
You have to decode a.content not a.text
I tried decoding it using utf-8, but it kept failing, so I went over to the url, they were using a specific type of charset, which is windows-1256.
I used that same thing to decode a.content, and voila!
Upvotes: 1