user7123764
user7123764

Reputation:

How to solve python 'utf-8' error?

I am trying to read a file of 6GB in my python 3 terminal and was not able to execute the read file line. the code is as below:

#define data directory

data_dir = 'C://Star/star_data/csv\Globe'

#read the review dataset
yelp = pd.read_csv(data_dir+'\star_data_python.csv')
X, y = star.data, star.target
X.shape

error:

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-4-bc09b45c73bb> in <module>()
      4 
      5 #read the review dataset
----> 6 yelp = pd.read_csv(data_dir+'\star_data_python.csv')
      7 X, y = star.data, star.target
      8 X.shape

What could be the problem? thanks

Upvotes: 1

Views: 14327

Answers (1)

Kruup&#246;s
Kruup&#246;s

Reputation: 5484

Use the r before your path since you are on Windows:

e.g

data_dir = r'C://Star/star_data/csv/Globe'

The 'r' means that the string is to be treated as a raw string, which means all escape codes will be ignored.

Try calling read_csv with encoding='latin1', encoding='iso-8859-1' or encoding='cp1252'; these the various encodings found on Windows.

e.g

full_path = data_dir + r'/star_data_python.csv'
pd.read_csv(full_path, encoding='latin1')

List of helpful SO answers:

Upvotes: 1

Related Questions