Reputation: 3
So I'm writing a program that loops through multiple .txt files and searches for any number of pre-specified keywords. I'm having some trouble finding a way to pass through the keywords list to be searched for.
The code below currently returns the following error:
TypeError: 'in <string>' requires string as left operand, not list
I'm aware that the error is caused by the keyword list but I have no idea how to input a large array of keywords without it running this error.
Current code:
from os import listdir
keywords=['Example', 'Use', 'Of', 'Keywords']
with open("/home/user/folder/project/result.txt", "w") as f:
for filename in listdir("/home/user/folder/project/data"):
with open('/home/user/folder/project/data/' + filename) as currentFile:
text = currentFile.read()
#Error Below
if (keywords in text):
f.write('Keyword found in ' + filename[:-4] + '\n')
else:
f.write('No keyword in ' + filename[:-4] + '\n')
The error is indicated in line 10 in the above code under the commented section. I'm unsure as to why I can't call a list to be able to search for the keywords. Any help is appreciated, thanks!
Upvotes: 0
Views: 3987
Reputation: 42143
You could replace
if (keywords in text):
...
with
if any(keyword in text for keyword in keywords):
...
Upvotes: 1
Reputation: 1330
I would use regular expressions as they are purpose-built for searching text for substrings.
You only need the re.search
block. I added examples of findall
and finditer
to demystify them.
# lets pretend these 4 sentences in `text` are 4 different files
text = '''Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum'''.split(sep='. ')
# add more keywords
keywords=[r'publishing', r'industry']
regex = '|'.join(keywords)
import re
for t in text:
lst = re.findall(regex, t, re.I) # re.I make case-insensitive
for el in lst:
print(el)
iterator = re.finditer(regex, t, re.I)
for el in iterator:
print(el.span())
if re.search(regex, t, re.I):
print('Keyword found in `' + t + '`\n')
else:
print('No keyword in `' + t + '`\n')
Output:
industry
(65, 73)
Keyword found in `Lorem Ipsum is simply dummy text of the printing and typesetting industry`
industry
(25, 33)
Keyword found in `Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book`
No keyword in `It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged`
publishing
(132, 142)
Keyword found in `It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum`
Upvotes: 0
Reputation: 204
try looping through the list to see if each element is in the text
for i in range(0, len(keywords)):
if keywords[i] in text:
f.write('Keyword found in ' + filename[:-4] + '\n')
break
else:
f.write('No keyword in ' + filename[:-4] + '\n')
break
you cannot use in
too see if a list is in a string
Upvotes: 0