Recognizing value as a string

Question

This code is supposed to get a string value from a an excel file. The value is how ever not being recognized as a string. How can I get query as a string? str(string) doesn't seem to work.

def main():
    file_location = "/Users/ronald/Desktop/Twitter/TwitterData.xlsx" 
    workbook = xlrd.open_workbook(file_location) #open work book
    worksheet = workbook.sheet_by_index(0)
    num_rows = worksheet.nrows - 1
    num_cells = worksheet.ncols - 1
    curr_row = 0
    curr_cell = 3
    count = 0
    string = 'tweet'
    tweets = []
    while curr_row < num_rows:
        curr_row += 1
        tweet = worksheet.cell_value(curr_row, curr_cell)
        tweet.encode('ascii', 'ignore')
        #print tweet
        query = str(tweet)
        if (isinstance(query, str)):
            print "it is a string"
        else:
            print "it is not a string"

This is the error i keep getting.

UnicodeEncodeError: 'ascii' codec can't encode characters in position 102-104: ordinal not in range(128)

Two-Bit Alchemist · Accepted Answer

There are two distinct types in Python that both represent strings in different ways.

str or bytes: This is the default in Python 2 (hence str), called bytes in Python 3. It represents a string as a series of bytes, which doesn't work very well for unicode, because each character is not necessarily one byte as in ASCII and some other encodings.
unicode or str: This is the default in Python 3. Unicode handles characters with accents and international characters, so especially when dealing with something like Twitter, that's what you want. In Python 2, this is also what causes some strings to have the little u'' prefix.

Your "is this a string?" test consists of isinstance(s, str), which only tests for the first type and ignores the other. Instead, you can test against basestring -- isinstance(s, basestring) -- as it is the parent of both str and unicode. This properly answers the question of "is this a string?" for Python 2, and it is why you were getting misleading results.

Note that if you ever migrate to Python 3, basestring does not exist. This is a Python 2 test only.

Recognizing value as a string

Answers (1)

Related Questions