Reputation: 810
I'm trying to write a code using regex and my text file. My file contains these words line by line:
nana
abab
nanac
eded
My purpose is: displaying the words which does not contain the letters which are given as substring's letters.
For example, if my substring is "bn"
, my output should be only eded
. Because nana
and nanac
contains "n" and abab
contains "b".
I have written a code but it only checks first letter of my substring:
import re
substring = "bn"
def xstring():
with open("deneme.txt") as f:
for line in f:
for word in re.findall(r'\w+', line):
for letter in substring:
if len(re.findall(letter, word)) == 0:
print(word)
#yield word
xstring()
How do I solve this problem?
Upvotes: 1
Views: 6419
Reputation: 27723
Here, we would just want to have a simple expression such as:
^[^bn]+$
We are adding b
and n
in a not-char class [^bn]
and collecting all other chars, then by adding ^
and $
anchors we will be failing all strings that might have b
and n
.
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"^[^bn]+$"
test_str = ("nana\n"
"abab\n"
"nanac\n"
"eded")
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
If this expression wasn't desired, it can be modified/changed in regex101.com.
jex.im visualizes regular expressions:
Upvotes: 4
Reputation: 4427
@Xosrov has the right approach, with a few minor issues and typos. The below version of the same logic works
import re
def xstring(substring, words):
regex = re.compile('[%s]' % ''.join(sorted(set(substring))))
# Excluding words matching regex.pattern
for word in words:
if not re.search(regex, word):
print(word)
words = [
'nana',
'abab',
'nanac',
'eded',
]
xstring("bn", words)
Upvotes: 2
Reputation: 11
It might not be the most efficient but you could try doing something with set intersections the following code segment will print the the value in the string word only if it does not contain any of the letters 'b' or 'n'
if (not any(set(word) & set('bn'))):
print(word)
Upvotes: 0
Reputation: 729
If you want to check if a string has a set of letters, use brackets.
For example using [bn]
will match words that contain one of those letters.
import re
substring = "bn"
regex = re.compile('[' + substring + ']')
def xstring():
with open("dename.txt") as f:
for line in f:
if(re.search(regex, line) is None):
print(line)
xstring()
Upvotes: 0