Reputation: 875
I'm trying to parse strings in python. I have posted a couple of questions on stack overflow and I was basically trying to combine the functionality of all the different possible ways of parsing the strings I am working with.
Here's a code snippet that works just fine in isolation to parse the two following string formats.
from __future__ import generators
from pprint import pprint
s2="<one><two><three> an.attribute ::"
s1="< one > < two > < three > here's one attribute < six : 10.3 > < seven : 8.5 > < eight : 90.1 > < nine : 8.7 >"
def parse(s):
for t in s.split('<'):
for u in t.strip().split('>',1):
if u.strip(): yield u.strip()
pprint(list(parse(s1)))
pprint(list(parse(s2)))
Here's the output that I get. It's in the format that I need where each attribute is stored in a different index location.
['one',
'two',
'three',
"here's one attribute",
'six : 10.3',
'seven : 8.5',
'eight : 90.1',
'nine : 8.7']
['one', 'two', 'three', 'an.attribute ::']
After that was done, I tried to incorporate the same code into a function which can parse four string formats but for some reason it doesn't seem to work here and I cant figure out why.
Here's the incorporated code in its entirety.
from __future__ import generators
import re
import string
from pprint import pprint
temp=[]
y=[]
s2="< one > < two > < three > an.attribute ::"
s1="< one > < two > < three > here's an attribute < four : 6.5 > < five : 7.5 > < six : 8.5 > < seven : 9.5 >"
t2="< one > < two > < three > < four : 220.0 > < five : 6.5 > < six : 7.5 > < seven : 8.5 > < eight : 9.5 > < nine : 6 - 7 >"
t3="One : two : three : four Value : five Value : six Value : seven Value : eight Value :"
def parse(s):
c=s.count('<')
print c
if c==9:
res = re.findall('< (.*?) >', s)
return res
elif (c==7|c==3):
temp=parsing(s)
pprint(list(temp))
#pprint(list(parsing(s)))
else:
res=s.split(' : ')
res = [item.strip() for item in s.split(':')]
return res
def parsing(s):
for t in s.split(' < '):
for u in t.strip().split('>',1):
if u.strip(): yield u.strip()
pprint(list((s)))
Now when I compile the code and call parse(s1)
I get the following as the output:
7
["< one > < two > < three > here's an attribute < four",
'6.5 > < five',
'7.5 > < six',
'8.5 > < seven',
Similarly, on calling parse(s2)
, I get:
3
['< one > < two > < three > an.attribute', '', '']
'9.5 >']
Why is there an inconsistency in spliting the string while it is being parsed? I'm using the same code in both places.
Could someone help me figure out why this is happening? :)
Upvotes: 0
Views: 128
Reputation: 1125398
You are using the binary |
bitwise or operator where you should be using the or
boolean operator instead:
elif (c==7|c==3):
should be
elif c==7 or c==3:
or perhaps:
elif c in (3, 7):
which is faster to boot.
Because the |
operator has a different precedence than the or
operator, the first statement was interpreted as (c == (7 | c) == 3)
with 7 | c
doing a bitwise logical operation, returning a result which is never going to be equal to both c
and 3
, so that always returns False
:
>>> c = 7
>>> (c==7|c==3)
False
>>> c = 3
>>> (c==7|c==3)
False
>>> c==7 or c==3
True
Upvotes: 2