CrashTestDummy
CrashTestDummy

Reputation: 51

Python regex to find words, which also excludes particular words

I'm very new to programming. I've searched this site and Google, but can't seem to resolve this issue. I'm finding similar topics, but still can't figure this out...

I have a text file containing a very large list of words. The words are all numbered and also categorized by 'noun', 'adjective' or 'verb'.

I'd like to extract the words from this list, but exclude numbers and the following three words, 'noun', 'adjective' and 'verb.'

I know I need to use the caret character, but can't seem to make it work.

import re
import os

textFile = open('/Users/MyComputer/wordList.txt')

textFileContent = textFile.read()

wordFinder = re.compile(r"""
[a-z]+ # finds words
[^noun|adjective|verb] # THIS IS WRONG
""", re.VERBOSE | re.I)

regexResults = wordFinder.findall(textFileContent)

Upvotes: 1

Views: 91

Answers (2)

Laurent H.
Laurent H.

Reputation: 6526

I suggest you to use a negative look-ahead, which could give this regex expression:

[^a-z](?!noun|adjective|verb)([a-z]+)

Upvotes: 0

tkjef
tkjef

Reputation: 732

import re

with open('wordList.txt') as f:
    for line in f:
        if re.search("^(?!noun|adjective|verb|\d)", line):
            print(line)

Upvotes: 1

Related Questions