Reputation: 61
Suppose I have a list like this.
List = ['MX_QW-765', 'RUC_PO-345', 'RUC_POLO-209'].
I want to search and return a match where 'PO' is there. Technically I should have RUC_PO-345
as my output, but even RUC_POLO-209
is getting returned as an output along with RUC_PO-345
.
Upvotes: 3
Views: 4300
Reputation: 56048
You should be using a regular expression (import re
), and this is the regular expression you should be using: r'(?<![A-Za-z0-9])PO(?![A-Za-z0-9])'
.
I previously recommended the \b
special sequence, but it turns out the '_'
is considered part of a word, and that isn't the case for you, so it wouldn't work.
This leaves you with the somewhat more complex negative look behind and negative lookahead assertions, which is what (?<!
... and (?!
... are, respectively. To understand how those work, read the documentation for Python regular expressions.
Upvotes: 0
Reputation: 1331
The pattern:
‘_PO[^\w]’
should work with a re.search() or re.findall() call; it will not work with a re.match as it doesn’t consider the characters at the beginning of the string.
The pattern reads: match 1 underscore (‘_’) followed by 1 capital P (‘P’) followed by 1 capital O (‘O’) followed by one character that is not a word character. The special character ‘\w’ matches [a-zA-Z0-9_]
.
‘_PO\W’
^ This might also be used as a shorter version to the first pattern suggested (credit @JvdV in comments)
‘_PO[^A-Za-z]’
This pattern uses the, ‘Set of characters not alpha characters.’ In the event the dash interferes with either of the first two patterns.
To use this to identify the pattern in a list, you can use a loop:
import re
For thing in my_list:
if re.search(‘_PO[^\w]’, thing) is not None:
# do something
print(thing)
This will use the re.search
call to match the pattern as the True condition in the if
conditional. When re doesn’t match a string, it returns None; hence the syntax of...if re.search() is not None
.
Hope it helps!
Upvotes: 1
Reputation: 75850
Before updated question:
As per my comment, I think you are using the wrong approach. To me it seems you can simply use in
:
words = ['cat', 'caterpillar', 'monkey', 'monk', 'doggy', 'doggo', 'dog']
if 'cat' in words:
print("yes")
else:
print("no")
Returns: yes
words = ['cats', 'caterpillar', 'monkey', 'monk', 'doggy', 'doggo', 'dog']
if 'cat' in words:
print("yes")
else:
print("no")
Returns: no
After updated question:
Now if your sample data does not actually reflect your needs but you are interested to find a substring within a list element, you could try:
import re
words = ['MX_QW-765', 'RUC_PO-345', 'RUC_POLO-209']
srch = 'PO'
r = re.compile(fr'(?<=_){srch}(?=-)')
print(list(filter(r.findall, words)))
Or using match
:
import re
words = ['MX_QW-765', 'RUC_PO-345', 'RUC_POLO-209']
srch = 'PO'
r = re.compile(fr'^.*(?<=_){srch}(?=-).*$')
print(list(filter(r.match, words)))
This will return a list of items (in this case just ['RUC_PO-345']
) that follow the pattern. I used the above regular pattern to make sure your searchvalue won't be at the start of the searchstrings, but would be after an underscore, and followed by a -
.
Now if you have a list of products you want to find, consider the below:
import re
words = ['MX_QW-765', 'RUC_PO-345', 'RUC_POLO-209']
srch = ['PO', 'QW']
r = re.compile(fr'(?<=_)({"|".join(srch)})(?=-)')
print(list(filter(r.findall, words)))
Or again using match
:
import re
words = ['MX_QW-765', 'RUC_PO-345', 'RUC_POLO-209']
srch = ['PO', 'QW']
r = re.compile(fr'^.*(?<=_)({"|".join(srch)})(?=-).*$')
print(list(filter(r.match, words)))
Both would return: ['MX_QW-765', 'RUC_PO-345']
Note that if you don't have f-strings supported you can also concat your variable into the pattern.
Upvotes: 5
Reputation: 541
We can try matching one of the three exact words 'cat','dog','monk' in our regex string.
Our regex string is going to be "\b(?:cat|dog|monk)\b"
\b
is used to define word boundary. We use \b
so that we could search for whole words (this is the exact problem you were facing). Adding this would not match tomcat
or caterpillar
and only cat
Next, (?:)
is called Non capturing group (Explained here )
Now we need to match either one of cat
or dog
or monk
. So this is expressed as cat|dog|monk
. In python 3 this would be:
import re
words = ['cat', 'caterpillar', 'monkey', 'monk', 'doggy', 'doggo', 'dog']
regex = r"\b(?:cat|dog|monk)\b"
r=re.compile(regex)
matched = list(filter(r.match, words))
print(matched)
To implement matching regex through an iterable list, we use filter
function as mentioned in a Stackoverflow answer here
You can find the runnable Python code here
NOTE: Finally, regex101 is a great online tool to try out different regex strings and get their explanation in real-time. The explanation for our regex string is here
Upvotes: 0
Reputation:
You need to add a $
sign which signifies the end of a string, you can also add a ^
which is the start of a string so only cat matches:
^cat$
Upvotes: 0
Reputation: 521239
Try building a regex alternation using the search terms in the list:
words = ['cat', 'caterpillar', 'monkey', 'monk', 'doggy', 'doggo', 'dog']
your_text = 'I like cat, dog, rabbit, antelope, and monkey, but not giraffes'
regex = r'\b(?:' + '|'.join(words) + r')\b'
print(regex)
matches = re.findall(regex, your_text)
print(matches)
This prints:
\b(?:cat|caterpillar|monkey|monk|doggy|doggo|dog)\b
['cat', 'dog', 'monkey']
You can clearly see the regex alternation which we built to find all matching keywords.
Upvotes: 1