Reputation: 810

How to Find Words Not Containing Specific Letters?

I'm trying to write a code using regex and my text file. My file contains these words line by line:

nana
abab
nanac
eded

My purpose is: displaying the words which does not contain the letters which are given as substring's letters.

For example, if my substring is "bn", my output should be only eded. Because nana and nanac contains "n" and abab contains "b".

I have written a code but it only checks first letter of my substring:

import re

substring = "bn"
def xstring():
    with open("deneme.txt") as f:
        for line in f:
            for word in re.findall(r'\w+', line):
                for letter in substring:
                    if len(re.findall(letter, word)) == 0:
                        print(word)
                        #yield word
xstring()

How do I solve this problem?

Upvotes: 1

Answers (4)

Emma

Reputation: 27723

Here, we would just want to have a simple expression such as:

^[^bn]+$

We are adding b and n in a not-char class [^bn] and collecting all other chars, then by adding ^ and $ anchors we will be failing all strings that might have b and n.

Demo

Test

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"^[^bn]+$"

test_str = ("nana\n"
    "abab\n"
    "nanac\n"
    "eded")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):
    
    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
    
    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1
        
        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

RegEx

If this expression wasn't desired, it can be modified/changed in regex101.com.

RegEx Circuit

jex.im visualizes regular expressions:

Upvotes: 4

Cireo

Reputation: 4427

@Xosrov has the right approach, with a few minor issues and typos. The below version of the same logic works

import re

def xstring(substring, words):
    regex = re.compile('[%s]' % ''.join(sorted(set(substring))))
    # Excluding words matching regex.pattern
    for word in words:
        if not re.search(regex, word):
            print(word)

words = [
    'nana',
    'abab',
    'nanac',
    'eded',
]

xstring("bn", words)

Upvotes: 2

Gilder

Reputation: 11

It might not be the most efficient but you could try doing something with set intersections the following code segment will print the the value in the string word only if it does not contain any of the letters 'b' or 'n'

if (not any(set(word) & set('bn'))):
        print(word)

Upvotes: 0

Xosrov

Reputation: 729

If you want to check if a string has a set of letters, use brackets.
For example using [bn] will match words that contain one of those letters.

import re
substring = "bn"
regex = re.compile('[' + substring + ']')
def xstring():
    with open("dename.txt") as f:
        for line in f:
            if(re.search(regex, line) is None):
                print(line)
xstring()