nrs
nrs

Reputation: 945

How to split string using regex in python?

Below is my string format.

test_string=`"test (11 MHz - 11 MHz)"`
 test1_string = 'test1 (11 MHz - 11 MHz)'

Needed output like below using regex in python:

output = ["test1", "11 MHz", "11 MHz"] 

Upvotes: 0

Views: 167

Answers (4)

bobble bubble
bobble bubble

Reputation: 18565

An idea with either non parenthesis at start or digits followed by mhz anywhere.

res = re.findall(r'(?i)^[^)(]+\b|\d+ mhz', test_string)

See this demo at regex101 or a Python demo at tio.run

  • with flag (?i) for ignorecase to match lower and upper Mhz
  • ^[^)(]+\b the first part will match one or more non parentheses from ^ start until a \b
  • | OR \d+ mhz one or more digits followed by the specified substring

This will work as long as your input matches the pattern.

Upvotes: 2

Andrej Kesely
Andrej Kesely

Reputation: 195633

You can use re.findall to search the text:

import re

text = "A1-A4 US (430 Mhz - 780 Mhz)"

first_text, second_text, third_text = re.findall(r'(.*?US).*?(\d+.Mhz).*?(\d+.Mhz)', text)[0]
print(first_text)
print(second_text)
print(third_text)

Prints:

A1-A4 US
430 Mhz
780 Mhz

Upvotes: 0

Mark Moretto
Mark Moretto

Reputation: 2348

Using named groups:

import re
sample = "A1-A4 US (430 Mhz - 780 Mhz)"

split_pat = r"""
    (?P<first>.+)               # Capture everything up to first space
    \s\(                        # Skip space and initial parentheses
    (?P<second>\d+\s\bMhz\b)    # Capture numeric values, space, and Mhz
    \s+?\-\s+?                  # Skip hyphen in the middle
    (?P<third>\d+\s\bMhz\b)     # Capture numeric values, space, and Mhz
    \)                          # Check for closing  parentheses
    """

# Use re.X flag to handle verbose pattern string
p = re.compile(split_pat, re.X)

first_text = p.search(sample).group('first')
second_text = p.search(sample).group('second')
third_text = p.search(sample).group('third')

Upvotes: 0

WayToDoor
WayToDoor

Reputation: 1750

This regex seems to do the job ([^(\n]*) \((\d* Mhz) - (\d* Mhz)\)

You can try it online

The website gives some code you can use for matcing with Python

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"([^(\n]*) \((\d* Mhz) - (\d* Mhz)\)"

test_str = ("A1-A4 US (430 Mhz - 780 Mhz)\n"
    "A7-A8 PS (420 Mhz - 180 Mhz)\n")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

Upvotes: 0

Related Questions