Reputation:
I am trying to read a file of 6GB in my python 3 terminal and was not able to execute the read file line. the code is as below:
#define data directory
data_dir = 'C://Star/star_data/csv\Globe'
#read the review dataset
yelp = pd.read_csv(data_dir+'\star_data_python.csv')
X, y = star.data, star.target
X.shape
error:
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-4-bc09b45c73bb> in <module>()
4
5 #read the review dataset
----> 6 yelp = pd.read_csv(data_dir+'\star_data_python.csv')
7 X, y = star.data, star.target
8 X.shape
What could be the problem? thanks
Upvotes: 1
Views: 14327
Reputation: 5484
Use the r
before your path since you are on Windows:
e.g
data_dir = r'C://Star/star_data/csv/Globe'
The 'r'
means that the string is to be treated as a raw string, which means all escape codes will be ignored.
Try calling read_csv
with encoding='latin1'
, encoding='iso-8859-1'
or encoding='cp1252'
; these the various encodings found on Windows.
e.g
full_path = data_dir + r'/star_data_python.csv'
pd.read_csv(full_path, encoding='latin1')
List of helpful SO answers:
Upvotes: 1