user1457123
user1457123

Reputation: 65

Python RegEx String Parsing with inconsistent data

I have a string that I need to extract values out of. The problem is the string is inconsistent. Here's an example of the script that has the string within it.

import re

RAW_Data = "Name Multiple Words Zero Row* (78.59/0) Name Multiple Words2* (96/24.56) Name Multiple Words3* (0/32.45) Name Multiple Words4* (96/12.58) Name Multiple Words5* (96/0) Name Multiple Words Zero Row6* (0) Name Multiple Words7* (96/95.57) Name Multiple Words Zero Row8* (0) Name Multiple Words9*"

First_Num = re.findall(r'\((.*?)\/*', RAW_Data)
Seg_Length = re.findall(r'\/(.*?)\)', RAW_Data)
#WithinParenthesis = re.findall(r'\((.*?)\)', RAW_Data) #This works correctly

print First_Num
print Seg_Length

del RAW_Data

What I need to get out of the string are all values within the parenthesis. However, I need some logic that will handle the absence of the "/" between the numbers. Basically if the "/" doesn't exist make both values for First_Num and Seg_Length equal to "0". I hope this makes sense.

Upvotes: 1

Views: 164

Answers (2)

Jan
Jan

Reputation: 43169

Use a simple regex and add some programming logic:

import re
rx = r'\(([^)]+)\)'
string = """Name Multiple Words Zero Row* (78.59/0) Name Multiple Words2* (96/24.56) Name Multiple Words3* (0/32.45) Name Multiple Words4* (96/12.58) Name Multiple Words5* (96/0) Name Multiple Words Zero Row6* (0) Name Multiple Words7* (96/95.57) Name Multiple Words Zero Row8* (0) Name Multiple Words9*"""

for match in re.finditer(rx, string):
    parts = match.group(1).split('/')
    First_Num = parts[0]
    try:
        Seg_Length = parts[1]
    except IndexError:
        Seg_Length = None

    print "First_Num, Seg_Length: ", First_Num, Seg_Length

You might get along with a regex alone solution (e.g. with conditional regex), but this approach is likely to be still understood in three months. See a demo on ideone.com.

Upvotes: 1

Joe
Joe

Reputation: 2447

You are attempting to find values on each side of '/' that you know may not exist. Pull back to the always known condition for your initial search. Use a Regular Expression to findall of data within parenthesis. Then process these based on if '/' is in the value.

Upvotes: 0

Related Questions