Sean
Sean

Reputation: 1313

How to search for repetitions in strings in Python

So i need a way for python to basically detect the difference between a string that looks like this:

W:1.0,X:1.1(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5,(F:0.6,G:0.7)H:0.8)Y:0.9

and this:

A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5,(F:0.6,G:0.7)H:0.8

Is there any function that can be used to detect that in the first string, there are 2 inner parenthesis following each other, whereas in the second string, the first inner parenthesis is eventually followed by a closed parenthesis? It would be best if it is not a .re regular expression. Thanks!

Edit: I am dealing with ANY case of parenthesis, anything from:

A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5,(F:0.6,G:0.7)H:0.8,(T:0.6,V:0.7)S:0.8,(D:0.6,Y:0.7)P:0.8,(X:0.6,L:0.7)M:0.8

ANY infinite amount on inner 2 child strings... To:

W:1.0,X:1.1(U:5.0(I:9.0)N:8.0,(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5,(F:0.6,G:0.7)H:0.8)R:3.4(O:5.5)P:3.0)Y:0.9

A highly complicated multiple child-fielded string, that can contain any infinite amount of children with their own children

Upvotes: 0

Views: 211

Answers (5)

Hugh Bothwell
Hugh Bothwell

Reputation: 56654

s = 'W:1.0,X:1.1(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5,(F:0.6,G:0.7)H:0.8)Y:0.9'

def max_depth(s, start_ch='(', end_ch=')'):
    depth = 0
    best = 0
    for ch in s:
        if ch == start_ch:
            depth += 1
            best = max(depth, best)
        elif ch == end_ch:
            depth -= 1
            if depth < 0:
                raise ValueError('illegal string - unmatched close-paren')
    if depth:
        raise ValueError('illegal string - unmatched open-paren')
    return best

print max_depth(s)    # => 2

Upvotes: 2

Luke
Luke

Reputation: 11644

Here's an awkward numpy approach. It's basically the same as astay13's suggestion, but should be fast for large datasets. If the data is so large that you run out of memory, then it would have to be processed in chunks.

>>> import numpy as np
>>> a = 'W:1.0,X:1.1(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5,(F:0.6,G:0.7)H:0.8)Y:0.9'
>>> arr = np.fromstring(a, dtype=np.ubyte)
>>> parens = (arr==ord('(')).astype(int) - (arr==ord(')')) ## search for parens

>>> parens   ## 1 marks location of opening paren, -1 marks closing paren
array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  1,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0, -1,  0,  0,  0,  0,  0,  0,  1,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0, -1,  0,  0,  0,  0,  0, -1,  0,  0,  0,  0,  0])

>>> parens[1:] += parens[:-1]  ## compute the nesting level at each character position
>>> parens   
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2,
    2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0])

Upvotes: 1

jfs
jfs

Reputation: 414395

To determine that a line has no nested parentheses (examples 2, 3 in your question):

re.match(r'(?: [^)]  |  \( [^)]* \) )*$', line, re.X)

i.e., a line is a non-paren character or non-nested expression in parens repeated zero or more times.

Upvotes: 0

astay13
astay13

Reputation: 6927

You could walk the string character by character and count the number of begin vs. end parentheses.

As Tim noted in the comment, you should have logic that identifies when you have more end parentheses than begin parentheses.

Upvotes: 1

EEP
EEP

Reputation: 725

A regular expression would be a more elegant way to do this, it could be as simple as re.search(r'\(.*?\(.*?\)', string) This would inform you if the string has two open parens before a close paren.

If you don't want to use these however you could iterate over the characters in the string and if you encounter two open parens without a close paren, handle it then

Upvotes: 0

Related Questions