Reputation: 1990
In short: I'd like to match any "word" (contiguous set of characters separated by whitespace) containing 1 letter and at least 1 of (numbers/certain special characters). These "words" can appear anywhere in a sentence.
Trying this in python
using re
So far, as a pattern, I have:
\w*[\d@]\w*
Which works, for the most part; however, I don't want to have "words" that are only numbers/special. Ex:
Should match:
h1DF346
123FE453
3f3g6hj7j5v3
hasdf@asdf
r3
r@
Should not match:
555555
@
hello
onlyletters
Having trouble excluding the first two under "should not match". Feel like there's something simple I'm missing here. Thanks!
Upvotes: 5
Views: 25248
Reputation: 43169
While you have your answer, you could still improve the velocity of the accepted regex:
(?=\d++[A-Za-z]+[\w@]+|[a-zA-Z]++[\w@]+)[\w@]{2,}
You'll need the newer regex
module here:
import regex as re
string = "h1DF346 123FE453 3f3g6hj7j5v3 hasdf@asdf r3 r@ 555555 @ hello onlyletters"
rx = re.compile(r'(?=\d++[A-Za-z]+[\w@]+|[a-zA-Z]++[\w@]+)[\w@]{2,}')
print(rx.findall(string))
# ['h1DF346', '123FE453', '3f3g6hj7j5v3', 'hasdf@asdf', 'r3', 'r@']
Highjacking @Roberto's demo, you'll have a significant reduction in steps needed to find matches (>7000 vs 338, ~20 times).
Upvotes: 0
Reputation: 381
If you merely change the * (match 0 or more) for + (match 1 or more), you can hit everything correctly.
\w+[\d@]\w+
Except for the 5555... Is there any further pattern to the distribution of letters and numbers that you can distinguish? Can you handle it by replacing a \w by a requirement for at least one letter before or after the [\d@]?
Upvotes: 1
Reputation: 2748
Use lookahead assertions like this.
Regex: (?=.*[a-zA-Z])(?=.*[@#\d])[a-zA-Z\d@#]+
Explanation:
(?=.*[a-zA-Z])
tests if something or not
is followed by one letter.
(?=.*[@#\d])
tests if something or not
is followed by one character from given character class.
[a-zA-Z\d@#]+
matches one or more characters from given character class.
Upvotes: 0
Reputation: 2185
I would use the |
or operator like this:
([A-Za-z]+[\d@]+[\w@]*|[\d@]+[A-Za-z]+[\w@]*)
meaning you want:
consider using non-capturing groups (?:...)
instead of (...)
if you are working with groups in other parts of your regular expression.
Upvotes: 4