baobobs
baobobs

Reputation: 703

Find characters within substring using Python regex

I have the following list:

l = ['(PREDIR )?NAME SUFTYP|PREDIR NAME( SUFTYP)?', '(PREDIR )?NAME|PREDIR NAME', '(PREDIR )?PRETYP NAME SUFTYP( SUFDIR)?|PREDIR (PRETYP )?NAME( SUFTYP)? SUFDIR', '(PREDIR )?PRETYP NAME|PREDIR (PRETYP )?NAME', 'NAME SUFTYP( SUFDIR)?|NAME( SUFTYP)? SUFDIR', 'NAME SUFTYP|NAME( SUFTYP)?', 'NAME|NAME', 'PRETYP NAME ( SUFDIR)?|(PRETYP )?NAME SUFDIR']

I want to find items that contain ? only on one side of |, and replace with only the side that contains ?.

Specifically, I want I want the items within l to be replaced with the following:

'(PREDIR )?NAME|PREDIR NAME' -> '(PREDIR )?NAME'

'(PREDIR )?PRETYP NAME|PREDIR (PRETYP )?NAME' -> '(PREDIR )?PRETYP NAME|PREDIR (PRETYP )?NAME'

'NAME SUFTYP|NAME( SUFTYP)?' -> 'NAME( SUFTYP)?'

The only way I can think of doing this is through an iterative process where I check for ? on the left side and not the right side first, and then vis-a-versa.

The following does not work though.

for i in l:
    i = re.sub(r'(.*?\?.*?)(\|.*?[^?].*?)',r'\1',i)

Upvotes: 0

Views: 71

Answers (2)

Rohit Jain
Rohit Jain

Reputation: 213263

Try this out:

l = ['(PREDIR )?NAME SUFTYP|PREDIR NAME( SUFTYP)?', '(PREDIR )?NAME|PREDIR NAME', 
     '(PREDIR )?PRETYP NAME SUFTYP( SUFDIR)?|PREDIR (PRETYP )?NAME( SUFTYP)? SUFDIR', 
     '(PREDIR )?PRETYP NAME|PREDIR (PRETYP )?NAME', 
     'NAME SUFTYP( SUFDIR)?|NAME( SUFTYP)? SUFDIR', 'NAME SUFTYP|NAME( SUFTYP)?', 
     'NAME|NAME', 'PRETYP NAME ( SUFDIR)?|(PRETYP )?NAME SUFDIR']

import re

l2 = []
for elem in l:
    inner = re.split("\|", elem);

    left = '?' in inner[0]
    right = '?' in inner[1]

    if (left and right) or not (left or right): 
        # Either both side of `|` have `?` or none of the sides have `?`
        l2.append(elem)
    elif left:
        l2.append(inner[0])
    else:
        l2.append(inner[1])

print l2

Upvotes: 1

DSM
DSM

Reputation: 353079

So if I understand you, you want to split the string by |, and if exactly one part has a ? in it, then return that, and otherwise return the string? I'm not sure regexes are worth the headache: why not

def fix(s):
    has_qmark = [part for part in s.split("|") if '?' in part]
    return has_qmark[0] if len(has_qmark) == 1 else s

instead? It's practically in English.

>>> fix('(PREDIR )?NAME|PREDIR NAME')
'(PREDIR )?NAME'
>>> fix('(PREDIR )?PRETYP NAME|PREDIR (PRETYP )?NAME')
'(PREDIR )?PRETYP NAME|PREDIR (PRETYP )?NAME'
>>> fix('NAME SUFTYP|NAME( SUFTYP)?')
'NAME( SUFTYP)?'

Upvotes: 1

Related Questions