Suzanne
Suzanne

Reputation: 754

Regex: combining two groups

Test string:

First
Name:
MICKEY
One to
four lines
of cruft go here
Last
Name:
MOUSE
More cruft
goes here

I want to return a single group "MICKEY MOUSE"

I have:

 (?:First\WName:)\W((.+)\W(?:((.+\W){1,4})(?:Last\WName:\W))(.+))

Group 2 returns MICKEY and group 5 returns MOUSE.

I thought that enclosing them in a single group and making the middle cruft and Last name segments non-capturing groups with ?: would prevent them from appearing. But Group 1 returns

MICKEY One to four lines of cruft go here Last Name: MOUSE

How can I get it to remove the middle stuff from what's returned (or alternately combine groups 2 and group 5 into a single named or numbered group)?

Upvotes: 1

Views: 19749

Answers (3)

Bilal Mahmoud
Bilal Mahmoud

Reputation: 26

To solve this you could make use of non capturing groups in regex. These are declared with: (?:)

After modifying the regex to:

(?:First\WName:)\W((.+)\W(?:(?:(?:.+\W){1,4})(?:Last\WName:\W))(.+))

you can do the following in python:

import re

inp = """
First
Name:
MICKEY
One to
four lines
of cruft go here
Last
Name:
MOUSE
More cruft
goes here
"""
query = r'(?:First\WName:)\W((.+)\W(?:(?:(?:.+\W){1,4})(?:Last\WName:\W))(.+))'
output = ' '.join(re.match(query, inp).groups())

Upvotes: 1

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

With re.search() function and specific regex pattern:

import re

s = '''
First
Name:
MICKEY
One to
four lines
of cruft go here
Last
Name:
MOUSE
More cruft
goes here'''

result = re.search(r'Name:\n(?P<firstname>\S+)[\s\S]*Name:\n(?P<lastname>\S+)', s).groupdict()
print(result)

The output:

{'firstname': 'MICKEY', 'lastname': 'MOUSE'}

----------

Or even simpler with re.findall() function:

result = re.findall(r'(?<=Name:\n)(\S+)', s)
print(result)

The output:

['MICKEY', 'MOUSE']

Upvotes: 1

Ajax1234
Ajax1234

Reputation: 71461

You can split the string and check if all characters are uppercase:

import re
s = """
First
Name:
MICKEY
One to
four lines
of cruft go here
Last
Name:
MOUSE
More cruft
goes here
"""
final_data = ' '.join(i for i in s.split('\n') if re.findall('^[A-Z]+$', i))

Output:

'MICKEY MOUSE'

Or, a pure regex solution:

new_data = ' '.join(re.findall('(?<=)[A-Z]+(?=\n)', s))

Output:

'MICKEY MOUSE'

Upvotes: 0

Related Questions