UnicodeDecodeError: 'charmap' codec can't decode byte Z in position Y: character maps to

Question

I'm attempting to perform sentiment analysis using a large training dataset. The problem is that when I perform the analysis using the 'sampleTweets.csv', everything turns out okay except that the analysis is not accurate because the sampleTweets dataset is too small.

When I use a larger dataset such as 'full_training_dataset.csv', I get the following error

return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 6961: character maps to

I've tried adding encoding="utf-8" and other encoding such as latin-1 but when I do that, the program continues running without producing any result in the console.

The following is the code, this is a github link of the project: https://github.com/ravikiranj/twitter-sentiment-analyzer, I'm using the simpleDemo.py file.

#Read the tweets one by one and process it
inpTweets = csv.reader(open('data/full_training_dataset.csv', 'r'), delimiter=',', quotechar='|')
stopWords = getStopWordList('data/feature_list/stopwords.txt')
count = 0
featureList = []
tweets = []
for row in inpTweets:
    sentiment = row[0]
    tweet = row[1]
    processedTweet = processTweet(tweet)
    featureVector = getFeatureVector(processedTweet, stopWords)
    featureList.extend(featureVector)
    tweets.append((featureVector, sentiment))

UnicodeDecodeError: 'charmap' codec can't decode byte Z in position Y: character maps to <undefined>

Answers (1)

Related Questions

UnicodeDecodeError: &#39;charmap&#39; codec can&#39;t decode byte Z in position Y: character maps to &lt;undefined&gt;

Answers (1)

Related Questions

UnicodeDecodeError: 'charmap' codec can't decode byte Z in position Y: character maps to <undefined>