Reputation: 73
I have a text file that contains the following:
Number1 (E, P) (F, H)
Number2 (A, B) (C, D)
Number3 (I, J) (O, Z)
I know more or less how to read it and how to get the values of it into my program, but I wanted to know how to correctly split into "Number 1", "(E,P)" and "(F, H)". Also later, I want to be able to check in my program if "Number1" contains "(E, P)" or not.
def read_srg(name):
filename = name + '.txt'
fp = open(filename)
lines = fp.readlines()
R = {}
for line in lines:
??? = line.split()
fp.close()
return R
Upvotes: 0
Views: 73
Reputation: 458
Because of the whitespaces within the parentheses, you better go with a regular expression, than just splitting lines.
Here's your read_srg
function, with the regex check integrated:
import re
def read_srg(name):
with open('%s.txt' % (name, ), 'r') as text:
matchstring = r'(Number[0-9]+) (\([A-Z,\s]+\)) (\([A-Z,\s]+\))'
R = {}
for i, line in enumerate(text):
match = re.match(matchstring, line)
if not match:
print 'skipping exception found in line %d: %s' % (i + 1, line)
continue
key, v1, v2 = match.groups()
R[key] = v1, v2
return R
from pprint import pformat
print pformat(read_srg('example'))
To read your dictionary and perform checks on keys and values, you can later do something like:
test_dict = read_srg('example')
for key, (v1, v2) in test_dict.iteritems():
matchstring = ''
if 'Number1' in key and '(E, P)' in v1:
matchstring = 'match found: '
print '%s%s > %s %s' % (matchstring, key, v1, v2)
A big advantage of this approach is that you can also use your regex to check that your file isn't malformed for some reason. This is why the matching rule is quite strict:
matchstring = r'(Number[0-9]+) (\([A-Z,\s]+\)) (\([A-Z,\s]+\))'
(Number[0-9]+)
will match only words made of Number
followed by any number of digits(\([A-Z,\s]+\))
will match only strings enclosed into ()
which contain capital letters or ,
or a whitespace I read in your comment that the format of the file is always the same, so I'm assuming it is procedurally generated. Still, you might want to check its integrity (or to be sure that your code does not break if at some point the procedure generating the txt file changes its formatting). Depending how strict you want your sanity check to be, you can push the above even further:
Number
, you might change (Number[0-9]+)
to (Number[0-9]{1,3})
(which limits the match to 1, 2 or 3 digits)", "
you can change (\([A-Z,\s]+\))
to (\([A-Z], [A-Z]\))
Upvotes: 1
Reputation: 336108
I think the easiest/most reliable way would be to use a regex:
import re
regex = re.compile(r"([^()]*) (\([^()]*\)) (\([^()]*\))")
with open("myfile.txt") as text:
for line in text:
contents = regex.match(line)
if contents:
label, g1, g2 = contents.groups()
# now do something with these values, e. g. add them to a list
Explanation:
([^()]*) # Match any number of characters besides parentheses --> group 1
[ ] # Match a space
(\([^()]*\)) # Match (, then any non-parenthesis characters, then ) --> group 2
[ ] # Match a space
(\([^()]*\)) # Match (, then any non-parenthesis characters, then ) --> group 3
Upvotes: 6
Reputation: 15204
You were really close. Try this:
def read_srg(name):
with open(name + '.txt', 'r') as f:
R = {}
for line in f:
line = line.replace(', ', ',') # Number1 (E, P) (F, H) -> Number1 (E,P) (F,H)
header, *contents = line.strip().split() # `header` gets the first item of the list and all the rest go to `contents`
R[header] = contents
return R
Checking for membership can be later done like so:
if "(E,P)" in R["Number1"]:
# do stuff
I did not test this but it should be fine. Let me know if anything comes up.
Upvotes: 0