user14904935
user14904935

Reputation:

Handling data pulled from MySQL database in Python

I am trying to figure out a good way to handle blacklists for words via a MySQL database. I have hit a roadblock when it comes to handling the data returned from the database.

cursor.execute('SELECT word FROM blacklist')
blacklist1 = []
for word in cursor.fetchall():
   if word in blacklist1:
      return
   else:
      blacklist1.append(word)

The above code is what I am using to pull the info which I know works. However, I need some help with converting this:

[('word1',), ('word2',), ('word3',), ('word4',), ('word5',)]

into this:

['word1', 'word2', 'word3', 'word4', 'word5']

my biggest issue is that I need it to scale so that it will check each word within the blacklist from no words to several thousand if necessary. I know a for loop would work when it comes to checking them versus the message it checks. but I know I will not be able to check the words till it is a normal list. any help would be appreciated.

Upvotes: 0

Views: 112

Answers (3)

Woody1193
Woody1193

Reputation: 7980

First, your actual problem is that the cursor is a wrapper of an iterator over rows returned from MySQL, so it can be operated on similarly to a list of tuples. That being said, my advice would be to split your "business" logic from your data access logic. This might seem trivial but it will make debugging much easier. The overall approach will look like this:

def get_from_database():
    cursor.execute('SELECT word FROM blacklist')
    return [row[0] for row in cursor.fetchall()]

def get_blacklist():
    words = get_from_database()
    return list(set(words))

In this approach, get_from_database retrieves all the words from MySQL and returns them in the format your program needs. get_blacklist encapsulates this logic and also makes the returned list unique. So now, if there's a bug, you can verify each independently.

Upvotes: 0

Parados
Parados

Reputation: 23

Your list currently contains one element tuples. If you want to extract the strings you could try this:

blacklist1 = []
for word_tuple in cursor.fetchall():
   if word_tuple[0] in blacklist1:
      return
   else:
      blacklist1.append(word_tuple[0])

For your use case you might also benefit from having blacklist1 be a set, that way you can check for membership in O(1) time:

blacklist1 = set()
for word_tuple in cursor.fetchall():
   if word_tuple[0] in blacklist1:
      return
   else:
      blacklist1.add(word_tuple[0])

Upvotes: 0

nanofarad
nanofarad

Reputation: 41281

In each iteration of for word in cursor.fetchall(), the variable word is a tuple, or a collection of values. This is documented here.

These correspond to each column returned, i.e. if you had a second column in your select statement ('SELECT word, replacement FROM blacklist') you would get tuples of two elements.

Use a set, and add the one and only element of the tuple, instead of the tuple itself:

for word_tuple in cursor.fetchall():
  blacklist1.add(word[0])

Looking at the code more closely, if word in blacklist1: return may be a logical error - as soon as you see a duplicate, you'll stop reading rows from the database. You were likely looking to just skip that duplicate - you don't actually need that logic anymore because sets automatically remove duplicates.

Upvotes: 1

Related Questions