Reputation: 2217

Searching with Regex in Python

I'm having a hard time understanding Regular Expressions in Python.

else:
    #REGEX1
    ret = re.search(r'name:(?P<scname>)',line) 
    if(ret != None):
        print('The  name is'+ret.group("scname"))
    else:
    #REGEX2
    ret = re.search(r'(?P<content>)',line)
    print('The content is'+ret.group("content"))

I'm parsing a text file with the following content

name:english
1001Nights 
A Night at the Call Center
Grammar
name:science
Engineering
Biology
Physics
name:maths
Algebra
Geometry

I want the output to be

The name is english
The content is 1001Nights
The content is A Night at the Call Center
The content is Grammar
The name is science
The content is Engineering
The content is Biology

Please help me correct my regex and suggest any link to understand regular expressions more easily. The official documentation feels a bit overwhelming since I'm new to Python

UPDATE

This is the error I get , if it helps

The subclient name is
Traceback (most recent call last):
  File "create&&bkp.py", line 32, in <module>
    print('The subclient name is'+ret.group("scname"))
IndexError: no such group

Upvotes: 0

Answers (4)

Padraic Cunningham

Reputation: 180522

You don't need a regex if your file is in the format posted:

with open("in.txt") as f:
    for line in f:
        if "name:" in line:
            print("The name is {}".format(line.rstrip().split("name:",1)[1]))
        else:
            print("The content is {}".format(line.rstrip()))

Output:

The name is english
The content is 1001Nights
The content is A Night at the Call Center
The content is Grammar
The name is science
The content is Engineering
The content is Biology
The content is Physics
The name is maths
The content is Algebra
The content is Geometry

Upvotes: 1

coo

Reputation: 114

else:
    #REGEX1
    ret = re.search(r'name:(.*)$',line) 
    if(ret != None):
        print('The  name is'+ret.group(1))
    else:
        #REGEX2
        # ret = re.search(r'(?P<content>)',line)
        print('The content is'+line))

Upvotes: 0

Yann Vernier

Reputation: 15887

ret = re.search(r'name:(?P<scname>)',line)

This searches for 'name:' somewhere in the line (not necessarily at the beginning), and if found, produces a match object with a group at the position after the colon. Since there's nothing between the > and ), this group is empty, but it does have the name scname. Thus the code snippet you've shown doesn't match the error. Other mismatches include the printing of part of the string before the error and the word "subclient".

I would consider simple string processing:

for line in lines:
    line=line.rstrip('\n')    # assuming it came from a file, remove newline
    if line.startswith('name:'):
        print('The name is '+line[len('name:'):])
    else:
        print('The content is '+line)

It's also possible to do the entire classification using the regex:

matcher=re.compile(r'^(name:(?P<name>.*)|(?P<content>.*))$')
for line in lines:
    m=matcher.match(line)
    for key,value in m.groupdict():
        if value is not None:
            print('The {} is {}'.format(key,value))

Upvotes: 2

vks

Reputation: 67988

(?<=:)(.*)$

This would be your regex1.See demo.

http://regex101.com/r/iZ9sO5/8

^(?!.*?:)(.*)$

This would be your regex2.See demo.

http://regex101.com/r/iZ9sO5/9

Upvotes: 0

Searching with Regex in Python

Answers (4)

Related Questions