Reputation: 394
I noticed that when I use the fake.word()
function with locale set to 'pl_PL'
it sometimes generates a swear word which is not ideal for me. Is there an easy way to force Faker to stop outputting swear words, preferably without having to list all of the swear words myself?
Upvotes: 0
Views: 718
Reputation: 13242
The original list comes from wiktionary, which we can see has been updated since this was made...
We can get the new list ourselves though, using the API:
import requests
from bs4 import BeautifulSoup
page = 'Indeks%3APolski_-_Najpopularniejsze_s%C5%82owa_1-2000'
method = 'html'
url = f"https://pl.wiktionary.org/api/rest_v1/page/{method}/{page}"
r = requests.get(url)
soup = BeautifulSoup(r.content)
words = [x['title'] for x in soup.find_all('a')]
print(words[:50])
print(len(words))
Output (First 50):
['w', 'z', 'być', 'na', 'i', 'do', 'nie', 'który', 'lub', 'to', 'się', 'o', 'mieć', 'coś', 'ten', 'dotyczyć', 'on', 'od', 'co', 'język', 'po', 'że', 'ktoś', 'przez', 'osoba', 'miasto', 'jeden', 'jak', 'za', 'ja', 'rok', 'a', 'bardzo', 'swój', 'dla', 'taki', 'człowiek', 'cecha', 'kobieta', 'mój', 'część', 'związany', 'móc', 'dwa', 'ona', 'związać', 'ze', 'mały', 'jakiś', 'miejsce']
2000
Then, we could replace the word list like so:
from faker.providers.lorem.pl_PL import Provider as PLProvider
PLProvider.word_list = tuple(words)
Upvotes: 2
Reputation: 19300
Unfortunately I do not know of a way to remove swear words without typing them out.
One option is to remove the swear words from the word list of the pl_PL lorem Provider
class.
from faker.providers.lorem.pl_PL import Provider as PLProvider
bad_words = ["kurwa"]
PLProvider.word_list = tuple(word for word in PLProvider.word_list if word not in bad_words)
(I use tuple
here because that is the original type of word_list
.)
Here is a more complete code example, including an assertion that the bad words are not in the list of possible words.
from faker import Faker
from faker.providers.lorem.pl_PL import Provider as P
bad_words = ["kurwa"]
P.word_list = tuple(word for word in P.word_list if word not in bad_words)
del P
Faker.seed(0)
fake = Faker(locale="pl_PL")
assert "kurwa" not in fake.words(1999, unique=True)
Upvotes: 2