Reputation: 307
I have an SQLite database which I am reading data from using Python. I connect to the database and then store the whole comment column into a list called output. Each entry in the list is a string, I want to isolate the entries in list which only contain the string '[deleted]'.
To do this I am using a loop to index into the list and then for each point I compare the string to at index i to the string '[deleted]', if the string is '[deleted]' it should set a corresponding value of 1 at index i in a vector of zeros called deleted flag. The code I am using is below:
deletedFlag = np.zeros((len(output),1))
for i in range(0,len(output)):
if (output[i] == "[deleted]"):
deletedFlag[i] = 1
The issue is that output[i] == '[deleted]' never returns true and therefore never sets the corresponding deletedFlag[i] = 1
Investigating further and printing output[i] to the console for a value of i I know to contain the '[deleted]' string returns a slightly different string than I expect:
>> print(output[3])
>> ('[deleted]',)
However even if I change my string comparison to be the same as the printed value I still get the same behavior of the deletedFlag vector remaining at all zeros:
deletedFlag = np.zeros((len(output),1))
for i in range(0,len(output)):
if (output[i] == "('[deleted]',)"):
deletedFlag[i] = 1
Looking at the first four database entries using DB Browser shows the following taken from a screenshot and it is line 4 of the comment column I am trying to identify:
I assume I am just doing the string comparison wrong but for the life of me I can't work out what it should be and I have tried most permutation of brackets and inverted commas I can think of. I understand that this is probably a really basic issue but any help would be greatly appreciated!
Upvotes: 2
Views: 34
Reputation: 550
Well ... that printed output[3]
kinda looks like a half filled tuple. so, you might need to check output[i][0]
instead.
Also, i'd suggest using __contains to do the string comparison and you don't need to write range(0,n)
: by default it starts at zero so a simple range(n)
does the exact same thing ;>
deletedFlag = np.zeros((len(output),1))
for i in range(len(output)):
if output[i][0].__contains("deleted"):
deletedFlag[i] = 1
Upvotes: 1