Rodrigo Villalba Zayas
Rodrigo Villalba Zayas

Reputation: 5636

Finding all possible substrings within a string. Python Regex

I want to find all possible substrings inside a string with the following requirement: The substring starts with N, the next letter is anything but P, and the next letter is S or T

With the test string "NNSTL", I would like to get as results "NNS" and "NST"

Is this possible with Regex?

Upvotes: 5

Views: 2382

Answers (3)

Aaron Hall
Aaron Hall

Reputation: 395843

You can do this with the re module:

import re

Here's a possible search string:

my_txt = 'NfT foo NxS bar baz NPT'

So we use the regular expression that first looks for an N, any character other than a P, and a character that is either an S or a T.

regex = 'N[^P][ST]'

and using re.findall:

found = re.findall(regex, my_txt)

and found returns:

['NfT', 'NxS']

Upvotes: 2

CJ Dennis
CJ Dennis

Reputation: 4356

Try the following regex:

N[^P\W\d_][ST]

The first character is N, the next character is none of (^) P, a non-letter (\W), a digit (\d) or underscore (_). The last letter is either S or T. I'm assuming the second character must be a letter.

EDIT

The above regex will only match the first instance in the string "NNSTL" because it will then start the next potential match at position 3: "TL". If you truly want both results at the same time use the following:

(?=(N[^P\W\d_][ST])).

The substring will be in group 1 instead of the whole pattern match which will only be the first character.

Upvotes: 4

ethguo
ethguo

Reputation: 180

Yes. The regex snippet is: "N[^P][ST]"

Plug it in to any regex module methods from here: http://docs.python.org/2/library/re.html

Explanation:

  • N matches a literal "N".
  • [^P] is a set, where the caret ("^") denotes inverse (so, it matches anything not in the set.
  • [ST] is another set, where it matches either an "S" or a "T".

Upvotes: 1

Related Questions