speeder1987
speeder1987

Reputation: 307

Conditional check for strings not giving the behavior I expect

I have an SQLite database which I am reading data from using Python. I connect to the database and then store the whole comment column into a list called output. Each entry in the list is a string, I want to isolate the entries in list which only contain the string '[deleted]'.

To do this I am using a loop to index into the list and then for each point I compare the string to at index i to the string '[deleted]', if the string is '[deleted]' it should set a corresponding value of 1 at index i in a vector of zeros called deleted flag. The code I am using is below:

deletedFlag = np.zeros((len(output),1))
for i in range(0,len(output)):
        if (output[i] == "[deleted]"):
            deletedFlag[i] = 1

The issue is that output[i] == '[deleted]' never returns true and therefore never sets the corresponding deletedFlag[i] = 1

Investigating further and printing output[i] to the console for a value of i I know to contain the '[deleted]' string returns a slightly different string than I expect:

>> print(output[3])
>> ('[deleted]',)

However even if I change my string comparison to be the same as the printed value I still get the same behavior of the deletedFlag vector remaining at all zeros:

deletedFlag = np.zeros((len(output),1))
for i in range(0,len(output)):
        if (output[i] == "('[deleted]',)"):
            deletedFlag[i] = 1

Looking at the first four database entries using DB Browser shows the following taken from a screenshot and it is line 4 of the comment column I am trying to identify:

enter image description here

I assume I am just doing the string comparison wrong but for the life of me I can't work out what it should be and I have tried most permutation of brackets and inverted commas I can think of. I understand that this is probably a really basic issue but any help would be greatly appreciated!

Upvotes: 2

Views: 34

Answers (1)

garglblarg
garglblarg

Reputation: 550

Well ... that printed output[3] kinda looks like a half filled tuple. so, you might need to check output[i][0] instead.

Also, i'd suggest using __contains to do the string comparison and you don't need to write range(0,n): by default it starts at zero so a simple range(n) does the exact same thing ;>

deletedFlag = np.zeros((len(output),1))
for i in range(len(output)):
    if output[i][0].__contains("deleted"):
        deletedFlag[i] = 1

Upvotes: 1

Related Questions