Reputation: 781
This code is supposed to get a string value from a an excel file. The value is how ever not being recognized as a string. How can I get query as a string? str(string) doesn't seem to work.
def main():
file_location = "/Users/ronald/Desktop/Twitter/TwitterData.xlsx"
workbook = xlrd.open_workbook(file_location) #open work book
worksheet = workbook.sheet_by_index(0)
num_rows = worksheet.nrows - 1
num_cells = worksheet.ncols - 1
curr_row = 0
curr_cell = 3
count = 0
string = 'tweet'
tweets = []
while curr_row < num_rows:
curr_row += 1
tweet = worksheet.cell_value(curr_row, curr_cell)
tweet.encode('ascii', 'ignore')
#print tweet
query = str(tweet)
if (isinstance(query, str)):
print "it is a string"
else:
print "it is not a string"
This is the error i keep getting.
UnicodeEncodeError: 'ascii' codec can't encode characters in position 102-104: ordinal not in range(128)
Upvotes: 0
Views: 89
Reputation: 18467
There are two distinct types in Python that both represent strings in different ways.
str
or bytes
: This is the default in Python 2 (hence str
), called bytes
in Python 3. It represents a string as a series of bytes, which doesn't work very well for unicode, because each character is not necessarily one byte as in ASCII and some other encodings.
unicode
or str
: This is the default in Python 3. Unicode handles characters with accents and international characters, so especially when dealing with something like Twitter, that's what you want. In Python 2, this is also what causes some strings to have the little u''
prefix.
Your "is this a string?" test consists of isinstance(s, str)
, which only tests for the first type and ignores the other. Instead, you can test against basestring
-- isinstance(s, basestring)
-- as it is the parent of both str
and unicode
. This properly answers the question of "is this a string?" for Python 2, and it is why you were getting misleading results.
Note that if you ever migrate to Python 3, basestring
does not exist. This is a Python 2 test only.
Upvotes: 1