Reputation: 1313
So i need a way for python to basically detect the difference between a string that looks like this:
W:1.0,X:1.1(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5,(F:0.6,G:0.7)H:0.8)Y:0.9
and this:
A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5,(F:0.6,G:0.7)H:0.8
Is there any function that can be used to detect that in the first string, there are 2 inner parenthesis following each other, whereas in the second string, the first inner parenthesis is eventually followed by a closed parenthesis? It would be best if it is not a .re regular expression. Thanks!
Edit: I am dealing with ANY case of parenthesis, anything from:
A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5,(F:0.6,G:0.7)H:0.8,(T:0.6,V:0.7)S:0.8,(D:0.6,Y:0.7)P:0.8,(X:0.6,L:0.7)M:0.8
ANY infinite amount on inner 2 child strings... To:
W:1.0,X:1.1(U:5.0(I:9.0)N:8.0,(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5,(F:0.6,G:0.7)H:0.8)R:3.4(O:5.5)P:3.0)Y:0.9
A highly complicated multiple child-fielded string, that can contain any infinite amount of children with their own children
Upvotes: 0
Views: 211
Reputation: 56654
s = 'W:1.0,X:1.1(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5,(F:0.6,G:0.7)H:0.8)Y:0.9'
def max_depth(s, start_ch='(', end_ch=')'):
depth = 0
best = 0
for ch in s:
if ch == start_ch:
depth += 1
best = max(depth, best)
elif ch == end_ch:
depth -= 1
if depth < 0:
raise ValueError('illegal string - unmatched close-paren')
if depth:
raise ValueError('illegal string - unmatched open-paren')
return best
print max_depth(s) # => 2
Upvotes: 2
Reputation: 11644
Here's an awkward numpy approach. It's basically the same as astay13's suggestion, but should be fast for large datasets. If the data is so large that you run out of memory, then it would have to be processed in chunks.
>>> import numpy as np
>>> a = 'W:1.0,X:1.1(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5,(F:0.6,G:0.7)H:0.8)Y:0.9'
>>> arr = np.fromstring(a, dtype=np.ubyte)
>>> parens = (arr==ord('(')).astype(int) - (arr==ord(')')) ## search for parens
>>> parens ## 1 marks location of opening paren, -1 marks closing paren
array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, -1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, -1, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0])
>>> parens[1:] += parens[:-1] ## compute the nesting level at each character position
>>> parens
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0])
Upvotes: 1
Reputation: 414395
To determine that a line has no nested parentheses (examples 2, 3 in your question):
re.match(r'(?: [^)] | \( [^)]* \) )*$', line, re.X)
i.e., a line is a non-paren character or non-nested expression in parens repeated zero or more times.
Upvotes: 0
Reputation: 6927
You could walk the string character by character and count the number of begin vs. end parentheses.
As Tim noted in the comment, you should have logic that identifies when you have more end parentheses than begin parentheses.
Upvotes: 1
Reputation: 725
A regular expression would be a more elegant way to do this, it could be as simple as re.search(r'\(.*?\(.*?\)', string)
This would inform you if the string has two open parens before a close paren.
If you don't want to use these however you could iterate over the characters in the string and if you encounter two open parens without a close paren, handle it then
Upvotes: 0