Reputation: 2584
I'm using python but code in any language will do as well for this question.
Suppose I have 2 strings.
sequence ='abcd'
string = 'axyzbdclkd'
In the above example sequence
is a subsequence of string
How can I check if sequence
is a subsequence of string
using regex? Also check the examples here for difference in subsequence and subarray and what I mean by subsequence.
The only think I could think of is this but it's far from what I want.
import re
c = re.compile('abcd')
c.match('axyzbdclkd')
Upvotes: 7
Views: 3868
Reputation: 26139
I don't think the solution is as simple as @schwobaseggl claims. Let me show you another sequence from your database: ab1b2cd
. By using the abcd
subsequence for pattern matching you can get 2 results: ab(1b2)cd
and a(b1)b(2)cd
. So for testing purposes the proposed ^.*a.*b.*c.*d.*$
is ok(ish), but for parsing the ^a(.*)b(.*)cd$
will always be greedy. To get the second result you'll need to make it lazy: ^a(.*?)b(.*)cd$
. So if you need this for parsing, then you should know how many variables are expected and to optimize the regex pattern you need to parse a few example strings and put the gaps with capturing groups only to the positions you really need them. An advanced version of this would inject the pattern of the actual variable instead of .*
, so for example ^ab(\d\w\d)cd$
or ^a(\w\d)b(\d)cd$
in the second case.
Upvotes: 0
Reputation: 476594
You can, for an arbitrary sequence
construct a regex like:
import re
sequence = 'abcd'
rgx = re.compile('.*'.join(re.escape(x) for x in sequence))
which will - for 'abcd'
result in a regex 'a.*b.*c.*d'
. You can then use re.find(..)
:
the_string = 'axyzbdclkd'
if rgx.search(the_string):
# ... the sequence is a subsequence.
pass
By using re.escape(..)
you know for sure that for instance '.'
in the original sequence
will be translated to '\.'
and thus not match any character.
Upvotes: 3
Reputation: 73460
Just allow arbitrary strings in between:
c = re.compile('.*a.*b.*c.*d.*')
# .* any character, zero or more times
Upvotes: 9