Reputation: 155
I'm doing text analysis now. My task is to count how many times each 'bad word' in a list appears in a string in a dataframe column. What I can think of is to use .isin()
or .contains()
to check word by word. But the length of the word list is over 40000. So the loop will be too slow. Is there a better way to do this?
Upvotes: 1
Views: 179
Reputation: 515
While you said that a loop might be too slow it does seem like the most efficient way due to the extent of the list. Tried to keep it as simple as possible. Feel free to modify the print statement based on your needs.
text = 'Bad Word test for Terrible Word same as Horrible Word and NSFW Word and Bad Word again'
bad_words = ['Bad Word', 'Terrible Word', 'Horrible Word', 'NSFW Word']
length_list = []
for i in bad_words:
count = text.count(i)
length_list.append([i, count])
print(length_list)
output:
[['Bad Word', 2], ['Terrible Word', 1], ['Horrible Word', 1], ['NSFW Word', 1]]
Alternatively your output as a string can be:
length_list = []
for i in bad_words:
count = text.count(i)
print(i + ' count: ' + str(count))
Output:
Bad Word count: 2
Terrible Word count: 1
Horrible Word count: 1
NSFW Word count: 1
Upvotes: 1