matanslook
matanslook

Reputation: 53

How to fix gibberish to Hebrew strings in python?

I'm trying to automate an email sending service, which sends a person's bus station to his mail.

In order to do so I need to pull some data from a Hebrew website, but all I get is a file with gibberish in it.

I have tried encoding to utf8, but all I get is more gibberish.

import requests
import pandas as pd

url = 'http://yit.maya-tour.co.il/yit-pass/Drop_Report.aspx?client_code=2660&coordinator_code=2669'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
print(df)
df.to_csv('my data.csv')

I expected for the following:

רשימת פיזורים

שם הנהג סוג הרכב הערות תאור שעה

מוניות הקניון מונית A35 פיזור-שדרות 06:30

but got:

               ×©× ×× ×× ×¡×× ×ר××  ...               ת××ר שע×
0  ××× ××ת ×קנ×××      ××× ×ת  ...  פ×××ר-ש×ר×ת  06:30

Upvotes: 3

Views: 340

Answers (1)

Alex
Alex

Reputation: 2402

A response object's .content property gives you the data in bytes, try doing .text instead:

html = requests.get(url).text

More detail here: What is the difference between 'content' and 'text'

Upvotes: 2

Related Questions