Reputation: 79
I am trying to generate some random words for a password generator library. I could use a really long list of words but I think thats really inefficient. I have also tried using a link like so:
import urllib.request
word_url = "http://svnweb.freebsd.org/csrg/share/dict/words?view=co&content-type=text/plain"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)'}
req = urllib.request.Request(word_url, headers=headers)
response = response = urllib.request.urlopen(req)
long_txt = response.read().decode()
words = long_txt.splitlines()
However, this is a likely target for a person who would want to attack this. Full comment:
You're getting your dictionary from the web? That's one of the first places I would attack this, through DNS spoofing or flat-out interception. Don't you have a standard system wordlist (e.g. /usr/share/dict/words) that you could use instead?
The user above suggested using a standard system wordlist. How could I use a standard system wordlist to generate some words? Or are there better ways to do it? Thanks.
Edit: The above user suggested a standard system worldist however I am on windows. I dont know whether this applies to windows but just so u know.
Upvotes: 1
Views: 4617
Reputation: 1722
You could get a random word from the Moby Dick book (or any other Gutenberg book) as follows:
import nltk
nltk.download('gutenberg')
from nltk.corpus import gutenberg
import random
moby = set(nltk.Text(gutenberg.words('melville-moby_dick.txt')))
moby = [word.lower() for word in moby if len(word) >2]
random_word = moby[int(random.random()*len(set(moby)))]
Note that I've taken the set of moby
so that all the words are uniformly distributed
Upvotes: 1