Reputation: 795
I'm preparing for an exam, and here's the task:
Given a string which consists of uppercase letters of English alphabet, find the longest substring of it, which doesn't contain QW
or WQ
.
I know I could do re.split
or something like that, but I made it a challenge for me to do it with a "regex matching" expression like len(max(re.findall(...), key=len))
without using split
or other methods. Is it even possible?
To find all matching subtrings, I tried this:
list(map(lambda x: x[0], re.findall(r'(((?<!QW|WQ).)+(?!QW|WQ))', text))
But this does match a substring which ends with WQ
, for example. How do I fix this?
Let me also clarify something. Let's say the string is WQABCDEFGHQW
. The answer in this case is QABCDEFGHQ
, as this is exactly the longest substring that doesn't contain WQ
or QW
.
Upvotes: 1
Views: 379
Reputation: 27609
.?(?:.(?<!QW|WQ))*
Any single character is ok. Any further character is ok unless it creates QW or WQ.
Python demo along with two split-solutions:
input: 'AQWBQWC':
find: ['AQ', 'WBQ', 'WC', '']
split1: ['AQ', 'WBQ', 'WC']
split2: ['AQ', 'WBQ', 'WC']
input: 'AQWBWQC':
find: ['AQ', 'WBW', 'QC', '']
split1: ['AQ', 'WBW', 'QC']
split2: ['AQ', 'WBW', 'QC']
input: '':
find: ['']
split1: ['']
split2: ['']
input: 'A':
find: ['A', '']
split1: ['A']
split2: ['A']
input: 'Q':
find: ['Q', '']
split1: ['Q']
split2: ['Q']
input: 'QW':
find: ['Q', 'W', '']
split1: ['Q', 'W']
split2: ['Q', 'W']
(The extra empty matches don't matter, just like other non-longest matches don't matter. Will get discarded by your max(..., key=len)
.)
Code (Try it online!):
import re
find = r'.?(?:.(?<!QW|WQ))*'
split1 = r'(?<=Q)(?=W)|(?<=W)(?=Q)'
split2 = r'(?=.(?<=QW|WQ))'
for s in 'AQWBQWC', 'AQWBWQC', '', 'A', 'Q', 'QW':
print('input: ', repr(s) + ':')
print('find: ', re.findall(find, s))
print('split1:', re.split(split1, s))
print('split2:', re.split(split2, s))
print()
Upvotes: 2
Reputation: 89557
You could use that: (?:(?!WQ|QW).(?<!WQ|QW))+
At each position (of the dot) it tests forward and backward if the character isn't a part of WQ or QW.
The key is to test forward before the dot and backward after the dot position.
Other possible pattern: (?:.(?!.?(?<=WQ|QW)))+
This time a lookbehind is inside a negative lookahead but since it is preceded by an optional character .?
, it checks the two possible positions of the WQ/QW sequence.
Or to reduce the number of steps an unrolled pattern:
[^WQ]+(?:[QW](?!.?(?<=QW|WQ))[^WQ]*)*|(?:[QW](?!.?(?<=QW|WQ))[^WQ]*)+
But it's a bit long (it's unrolled).
Upvotes: 1
Reputation: 5390
You could use
(?:(?!Q(?!W)|(?<!W)Q).)+
import re
s = "HAKSDUWQHUPHPSAHFPUAHPUSNFJHJWQHPJQWPHJWQWASDIAS"
print(max(re.findall(r"(?:(?!Q(?!W)|(?<!W)Q).)+", s), key=len)) # 'HUPHPSAHFPUAHPUSNFJHJW'
See the regex demo
Upvotes: 0
Reputation: 626870
You can use
(?:[A-Z](?<!QW|WQ)(?<!Q(?=W)|W(?=Q)))+
See the regex demo.
Details:
(?:
- start of a non-capturing group:
[A-Z]
- an uppercase ASCII letter(?<!QW|WQ)
- a negative lookbehind that fails the match if, immediately on the left, there is a QW
or WQ
substring (i.e. if the [A-Z]
matched W
(preceded with Q
) or Q
(preceded with W
))(?<!Q(?=W)|W(?=Q))
- a negative lookbehind that fails the match if, immediately on the left, there is a Q
immediately followed with W
, or W
that is immediately followed with Q
(i.e. if [A-Z]
matched Q
and the next char is W
, or if [A-Z]
matches W
and the next char is Q
))+
- end of the group, one or more occurrences.Another approach:
(?:(?!QW|WQ)[A-Z](?<!QW|WQ))+
See this regex demo. Details:
(?:
- start of a non-capturing group:
(?!QW|WQ)
- a negative lookahead that fails the match if, immediately on the right, there is a QW
or WQ
substring[A-Z]
- an uppercase ASCII letter(?<!QW|WQ)
- a negative lookbehind that fails the match if, immediately on the left, there is a QW
or WQ
substring)+
- end of the group, one or more occurrences.Upvotes: 2