Eric
Eric

Reputation: 387

How to use regex to split a string into several parts

I have a string ' DIM D =9999 \ PE TS D(A(4))' and want to use regex expression to divided this string into several parts. Note that 'DIM' '=' is constant and after = is always a number but after the number, the content may change a lot. But the space between 'DIM' and '=' may be different string by string. This variance is also applied to the following substring. Also, I want to have 'PE TS' to be recognized as one element. So, I am thinking this string can have several groups: 'DIM','D', '=9999', '\', 'PE TS', 'D(A(4))'.

I've tried re.match but I cannot find a good pattern expression for it.

match = re.match('(DIM\s+\S)(\d*)(\S+)([\w\s]*)(\s*\w*)', line)

I expected to see:

'DIM', '= 9999', '\', 'PE TS','D(A(4))'

But, I always get None returned.

Upvotes: 1

Views: 674

Answers (2)

Emma
Emma

Reputation: 27723

Maybe, an expression somewhat similar to,

(DIM\s+\S+)\s*(=\s*\d+)\s*(\S+)\s*(.+?)\s{2,}(.+)

might just work OK, not sure though.

Test

import re

regex = r"(DIM\s+\S+)\s*(=\s*\d+)\s*(\S+)\s*(.+?)\s{2,}(.+)"
test_str = """
   DIM D =9999  \ PE TS                         D(A(4))
    DIM AZ =    9999  \   PE TS AC AB                         D(A(4))
"""

print(re.findall(regex, test_str))

Output

[('DIM D', '=9999', '\\', 'PE TS', 'D(A(4))'), ('DIM AZ', '=    9999', '\\', 'PE TS AC AB', 'D(A(4))')]

The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.

Upvotes: 1

user11104582
user11104582

Reputation:

I know for sure match won't do what you're expecting because:

  1. Assuming your regex selected all the items on your list (there was nothing to handle the \, ( and )), match only selects things at the beginning of the string.
  2. Assuming you used search and had the right regex, search would match DIM D =9999 \ PE TS D(A(4)) and not split it.
  3. Assuming you did use search, had the right regex, and it had a match, you'd need to use .group() or .groups() after re.search(...) to retrieve the matches as strings.
  4. Assuming you did have a function that split the string like you wanted to, it'd be in separate regex blocks (one for DIM, one for =9999, etc.)

Here's how I split the string, though Emma's answer may be better:

import re

myString = '    DIM D =9999  \ PE TS                         D(A(4))'

# DIM D
dim = re.search('(DIM\s+\S)', myString).group()
equals9999 = re.search('(=\s*\d+)', myString).group()
backslash = re.search(r'\\', myString).group()
twoDoubleLetters = re.search(r'\\(\s+\w+\s+\w+)', myString).group()[2:]
cellMarker = re.search(r'\w\(\w\(\d\)\)', myString).group()

print(dim) # DIM D
print(equals9999) # =9999
print(backslash) # \
print(twoDoubleLetters) # PE TS
print(cellMarker) # D(A(4))

Upvotes: 0

Related Questions