Jamichael999
Jamichael999

Reputation: 41

python's requests showing weird language instead of arabic

I am making a python script, and I want it to get an Arabin text from the site, but when I use requests to get the text all I get is:

æóæÇÌóÒóÚÇð áóæ ßÇäó áöáäóÝÓö ãóÌÒóÚõ

instead of this:

اذا ما مَشَت نادى بما في ثِيابها ذكِيُّ الشذا والمَندَليّ المطَيَّرُ

I tried the same code on different site that also uses Arabic, and the code worked perfectly and grabbed the Arabic text without any problems

from bs4 import BeautifulSoup
import requests

a = requests.get("https://www.aldiwan.net/poem30.html")
a = a.text

Upvotes: 2

Views: 911

Answers (2)

Sathish J
Sathish J

Reputation: 1

try this

from bs4 import BeautifulSoup import requests

a = requests.get("https://www.aldiwan.net/poem30.html") a = a.content.decode('utf-8')

Upvotes: 0

Shuvojit
Shuvojit

Reputation: 1470

You have to decode a.content not a.text

I tried decoding it using utf-8, but it kept failing, so I went over to the url, they were using a specific type of charset, which is windows-1256.

enter image description here

I used that same thing to decode a.content, and voila!

enter image description here

Upvotes: 1

Related Questions