Reputation: 945
Below is my string format.
test_string=`"test (11 MHz - 11 MHz)"`
test1_string = 'test1 (11 MHz - 11 MHz)'
Needed output like below using regex in python:
output = ["test1", "11 MHz", "11 MHz"]
Upvotes: 0
Views: 167
Reputation: 18565
An idea with either non parenthesis at start or digits followed by mhz
anywhere.
res = re.findall(r'(?i)^[^)(]+\b|\d+ mhz', test_string)
See this demo at regex101 or a Python demo at tio.run
(?i)
for ignorecase to match lower and upper Mhz
^[^)(]+\b
the first part will match one or more non parentheses from ^
start until a \b
|
OR \d+ mhz
one or more digits followed by the specified substringThis will work as long as your input matches the pattern.
Upvotes: 2
Reputation: 195633
You can use re.findall
to search the text:
import re
text = "A1-A4 US (430 Mhz - 780 Mhz)"
first_text, second_text, third_text = re.findall(r'(.*?US).*?(\d+.Mhz).*?(\d+.Mhz)', text)[0]
print(first_text)
print(second_text)
print(third_text)
Prints:
A1-A4 US
430 Mhz
780 Mhz
Upvotes: 0
Reputation: 2348
Using named groups:
import re
sample = "A1-A4 US (430 Mhz - 780 Mhz)"
split_pat = r"""
(?P<first>.+) # Capture everything up to first space
\s\( # Skip space and initial parentheses
(?P<second>\d+\s\bMhz\b) # Capture numeric values, space, and Mhz
\s+?\-\s+? # Skip hyphen in the middle
(?P<third>\d+\s\bMhz\b) # Capture numeric values, space, and Mhz
\) # Check for closing parentheses
"""
# Use re.X flag to handle verbose pattern string
p = re.compile(split_pat, re.X)
first_text = p.search(sample).group('first')
second_text = p.search(sample).group('second')
third_text = p.search(sample).group('third')
Upvotes: 0
Reputation: 1750
This regex seems to do the job ([^(\n]*) \((\d* Mhz) - (\d* Mhz)\)
The website gives some code you can use for matcing with Python
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"([^(\n]*) \((\d* Mhz) - (\d* Mhz)\)"
test_str = ("A1-A4 US (430 Mhz - 780 Mhz)\n"
"A7-A8 PS (420 Mhz - 180 Mhz)\n")
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
Upvotes: 0