Reputation: 1535
I have the following regex:
^([A-Za-z]{2,3}\d{6}|\d{5}|\d{3})((\d{3})?)(\d{2}|\d{3}|\d{6})(\d{2}|\d{3})$
I use this regex to match different, yet similar strings:
# MOR644-004-007-001
MOR644004007001 # string provided
# VUF00101-050-08-01
VUF001010500801 # string provided
# MF001317-077944-01
MF00131707794401 # string provided
These strings need to match/group as it is at the top of the strings, however my problem is that it is not grouping it correctly
The first string: MOR644004007001
is grouped: (MOR644004) (007) (001)
which should be (MOR644) (004) (007) (001)
The second string: VUF001010500801
is grouped (VUF001010) (500) (801)
which should be (VUF00101) (050) (08) (01)
How can I change ([A-Za-z]{2,3}\d{6}|\d{5}|\d{3})((\d{3})?)
so that it would group the provided string correctly?
Upvotes: 1
Views: 70
Reputation: 1134
Take a look at this. I have used what's called as a named group. As pointed out earlier by others, it's better to have one regex code for each string. I have shown here for the first string, MOR644004007001
. Easily you can expand for other two strings:
import re
# MOR644-004-007-001
MOR = "MOR644004007001" # string provided
# VUF00101-050-08-01
VUF = "VUF001010500801" # string provided
# MF001317-077944-01
MF = "MF00131707794401" # string provided
MORcompile = re.compile(r'(?P<first>\w{,6})(?P<second>\d{,3})(?P<third>\d{,3})(?P<fourth>\d{,3})')
MORsearch = MORcompile.search(MOR.strip())
print MORsearch.group('first')
print MORsearch.group('second')
print MORsearch.group('third')
print MORsearch.group('fourth')
MOR644
004
007
001
Upvotes: 0
Reputation: 8332
Considering your comment to Serv I'd say the (only?) solution is to have one regex for each possibility, like -
MOR(\d{3})(\d{3})(\d{3})(\d{3})|VUF(\d{5})(\d{3})(\d{2})(\d{2})|MF(\d{6})(\d{6})(\d{2})
and then use the execution environment (JS/php/python - you haven't provided which one) to piece the parts together.
See example on regex101 here. Note that substitution, only as an example, matches only the second string.
Regards
Upvotes: 1
Reputation: 165
I am not sure that you can do what you want to. Let's consider the first two strings:
# MOR644-004-007-001
MOR644004007001 # string provided
# VUF00101-050-08-01
VUF001010500801 # string provided
Now, both the strings are composed of 3 chars followed by 12 digits. Thus, given a regex R, if R does not depend on particular (sequences of) characters and on particular (sequences of) digits (i.e., it presents [A-Za-z]
and \d
but does not present, let's say, MO
and 0070
), then it will match both the string in the same way.
So, if you want to operate a different matching, then you need to look at the particular occurrence of certain characters or digits. We need more data from you in order to give you an aswer.
Finally, I suggest you to take a look at this tool:
http://regex.inginf.units.it/ (demo: http://regex.inginf.units.it/demo.html). It is a research project that automatically generates a regex given (many) examples of extraction. I warmly suggest you to try it, especially if you know that an underlying pattern is present in your case for sure (i.e. strings beginning with VUF
must be matched differently from strings beginning with MOR
) but you are unable to find it. Again, you will need to provide many examples to the engine. Needles to say, if a generic pattern does not exist, then the tool won't find it ;)
Upvotes: 2