Reputation: 387
I have a string ' DIM D =9999 \ PE TS D(A(4))'
and want to use regex expression to divided this string into several parts. Note that 'DIM'
'='
is constant and after =
is always a number but after the number, the content may change a lot. But the space between 'DIM'
and '='
may be different string by string. This variance is also applied to the following substring. Also, I want to have 'PE TS'
to be recognized as one element.
So, I am thinking this string can have several groups: 'DIM'
,'D'
, '=9999'
, '\'
, 'PE TS'
, 'D(A(4))'
.
I've tried re.match but I cannot find a good pattern expression for it.
match = re.match('(DIM\s+\S)(\d*)(\S+)([\w\s]*)(\s*\w*)', line)
I expected to see:
'DIM', '= 9999', '\', 'PE TS','D(A(4))'
But, I always get None returned.
Upvotes: 1
Views: 674
Reputation: 27723
Maybe, an expression somewhat similar to,
(DIM\s+\S+)\s*(=\s*\d+)\s*(\S+)\s*(.+?)\s{2,}(.+)
might just work OK, not sure though.
import re
regex = r"(DIM\s+\S+)\s*(=\s*\d+)\s*(\S+)\s*(.+?)\s{2,}(.+)"
test_str = """
DIM D =9999 \ PE TS D(A(4))
DIM AZ = 9999 \ PE TS AC AB D(A(4))
"""
print(re.findall(regex, test_str))
[('DIM D', '=9999', '\\', 'PE TS', 'D(A(4))'), ('DIM AZ', '= 9999', '\\', 'PE TS AC AB', 'D(A(4))')]
The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.
Upvotes: 1
Reputation:
I know for sure match
won't do what you're expecting because:
\
, (
and )
), match
only selects things at the beginning of the string. search
and had the right regex, search
would match DIM D =9999 \ PE TS D(A(4))
and not split it..group()
or .groups()
after re.search(...)
to retrieve the matches as strings.DIM
, one for =9999
, etc.) Here's how I split the string, though Emma's answer may be better:
import re
myString = ' DIM D =9999 \ PE TS D(A(4))'
# DIM D
dim = re.search('(DIM\s+\S)', myString).group()
equals9999 = re.search('(=\s*\d+)', myString).group()
backslash = re.search(r'\\', myString).group()
twoDoubleLetters = re.search(r'\\(\s+\w+\s+\w+)', myString).group()[2:]
cellMarker = re.search(r'\w\(\w\(\d\)\)', myString).group()
print(dim) # DIM D
print(equals9999) # =9999
print(backslash) # \
print(twoDoubleLetters) # PE TS
print(cellMarker) # D(A(4))
Upvotes: 0