Reputation: 51
I'm very new to programming. I've searched this site and Google, but can't seem to resolve this issue. I'm finding similar topics, but still can't figure this out...
I have a text file containing a very large list of words. The words are all numbered and also categorized by 'noun', 'adjective' or 'verb'.
I'd like to extract the words from this list, but exclude numbers and the following three words, 'noun', 'adjective' and 'verb.'
I know I need to use the caret character, but can't seem to make it work.
import re
import os
textFile = open('/Users/MyComputer/wordList.txt')
textFileContent = textFile.read()
wordFinder = re.compile(r"""
[a-z]+ # finds words
[^noun|adjective|verb] # THIS IS WRONG
""", re.VERBOSE | re.I)
regexResults = wordFinder.findall(textFileContent)
Upvotes: 1
Views: 91
Reputation: 6526
I suggest you to use a negative look-ahead, which could give this regex expression:
[^a-z](?!noun|adjective|verb)([a-z]+)
Upvotes: 0
Reputation: 732
import re
with open('wordList.txt') as f:
for line in f:
if re.search("^(?!noun|adjective|verb|\d)", line):
print(line)
Upvotes: 1