Reputation:
I am trying to figure out a good way to handle blacklists for words via a MySQL database. I have hit a roadblock when it comes to handling the data returned from the database.
cursor.execute('SELECT word FROM blacklist')
blacklist1 = []
for word in cursor.fetchall():
if word in blacklist1:
return
else:
blacklist1.append(word)
The above code is what I am using to pull the info which I know works. However, I need some help with converting this:
[('word1',), ('word2',), ('word3',), ('word4',), ('word5',)]
into this:
['word1', 'word2', 'word3', 'word4', 'word5']
my biggest issue is that I need it to scale so that it will check each word within the blacklist from no words to several thousand if necessary. I know a for loop would work when it comes to checking them versus the message it checks. but I know I will not be able to check the words till it is a normal list. any help would be appreciated.
Upvotes: 0
Views: 112
Reputation: 7980
First, your actual problem is that the cursor is a wrapper of an iterator over rows returned from MySQL, so it can be operated on similarly to a list of tuples. That being said, my advice would be to split your "business" logic from your data access logic. This might seem trivial but it will make debugging much easier. The overall approach will look like this:
def get_from_database():
cursor.execute('SELECT word FROM blacklist')
return [row[0] for row in cursor.fetchall()]
def get_blacklist():
words = get_from_database()
return list(set(words))
In this approach, get_from_database
retrieves all the words from MySQL and returns them in the format your program needs. get_blacklist
encapsulates this logic and also makes the returned list unique. So now, if there's a bug, you can verify each independently.
Upvotes: 0
Reputation: 23
Your list currently contains one element tuples. If you want to extract the strings you could try this:
blacklist1 = []
for word_tuple in cursor.fetchall():
if word_tuple[0] in blacklist1:
return
else:
blacklist1.append(word_tuple[0])
For your use case you might also benefit from having blacklist1
be a set, that way you can check for membership in O(1) time:
blacklist1 = set()
for word_tuple in cursor.fetchall():
if word_tuple[0] in blacklist1:
return
else:
blacklist1.add(word_tuple[0])
Upvotes: 0
Reputation: 41281
In each iteration of for word in cursor.fetchall()
, the variable word
is a tuple, or a collection of values. This is documented here.
These correspond to each column returned, i.e. if you had a second column in your select statement ('SELECT word, replacement FROM blacklist')
you would get tuples of two elements.
Use a set, and add the one and only element of the tuple, instead of the tuple itself:
for word_tuple in cursor.fetchall():
blacklist1.add(word[0])
Looking at the code more closely, if word in blacklist1: return
may be a logical error - as soon as you see a duplicate, you'll stop reading rows from the database. You were likely looking to just skip that duplicate - you don't actually need that logic anymore because sets automatically remove duplicates.
Upvotes: 1