Reputation: 323
Would anybody kindly help me in the following example (If I use re.DOTALL
, it reads until end of the file):
import re
text = "Found to A from:\n\t-B\n\t-C\nFound to K from:\n\t-B\n\t-D\n\t-E\nFound to A from:\n\t-D\nMax time: 20s"
names = ['A', 'K']
for name in names:
print name
print re.findall("Found to {0} from:\n\t\-(.+)".format(name), text)
TEXT is like :
OUTPUT:
A
['B', 'D']
K
['B']
Desired OUTPUT:
A
['B', 'C', 'D']
K
['B', 'D', 'E']
Upvotes: 2
Views: 525
Reputation: 4504
And here is another approach (Python 2.7x):
import re
text = 'Found to A from:\n\t-B\n\t-C\nFound to K from:\n\t-B\n\t-D\n\t-E\nFound to A from:\n\t-D\nMax time: 20s'
for name in ['A', 'K']:
print name
print [ n for i in re.findall('(?:Found to ' + name + ' from:)(?:\\n\\t-([A-Z]))(?:\\n\\t-([A-Z]))?(?:\\n\\t-([A-Z]))?', text) for n in i if n ]
Output:
A
['B', 'C', 'D']
K
['B', 'D', 'E']
UPDATE In case you don't know how many (?:\n\t-([A-Z])), I suggest the following approach:
import re
text = 'Found to A from:\n\t-B\n\t-C\n\t-G\nFound to K from:\n\t-B\n\t-D\n\t-E\nFound to A from:\n\t-D\nMax time: 20s'
for name in ['A', 'K']:
print name
groups = re.findall('(?:Found to ' + name + ' from:)((?:\\n\\s*-(?:[A-Z]))+)', text)
print reduce(lambda i,j: i + j, map(lambda x: re.findall('\n\s*-([A-Z])', x), groups))
Output:
A
['B', 'C', 'G', 'D']
K
['B', 'D', 'E']
Upvotes: 4
Reputation: 2465
Not generic but works in your case and is simple and is using findAll like you mentioned.
import re
text = "Found to A from:\n\t-B\n\t-C\nFound to K from:\n\t-B\n\t-D\n\t-E\nFound to A from:\n\t-D\n"
names = ['A', 'K']
for name in names:
print name
test = re.findall("Found to {0} from:\n\t-([A-Z])(\n\t)?-?([A-Z])?(\n\t)?-?([A-Z])?".format(name), text)
# normalize it
prettyList = []
for (a,b,c,d,e) in test:
prettyList.append(a)
prettyList.append(c)
prettyList.append(e)
print [x for x in prettyList if x]
The output
A
['B', 'C', 'D']
K
['B', 'D', 'E']
I know there are lot of cases with 3 Elements and so on you have to add the additional matching.
Upvotes: 0
Reputation: 78650
When I was typing this answer I was trying to answer your original question where you had a file with specific content to parse. I think my answer still applies. If you have a string instead, change
for line in f:
to
for line in f.splitlines():
and pass the string instead of the file object to keys_and_values
.
Original answer:
In all honesty I think this looks like a task where the heavy lifting should be done by a generator, with some help from regular expressions.
import re
from collections import OrderedDict
def keys_and_values(f):
# discard any headers
target = '^\s*Found to [A-Z] from:\s*$'
for line in f:
if re.match(target, line.strip()):
break
# yield (key, value) tuples
key = line.strip()[9]
for line in f:
line = line.strip()
if re.match(target, line):
key = line[9]
elif line:
yield (key, line)
result = OrderedDict()
with open('testfile.txt') as f:
for k,v in keys_and_values(f):
result.setdefault(k, []).append(v)
for k in result:
print('{}\n{}\n'.format(k, result[k]))
Demo:
$ cat testfile.txt
some
useless
header
lines
Found to A from:
B
C
Found to K from:
B
D
E
Found to A from:
D
$ python parsefile.py
A
['B', 'C', 'D']
K
['B', 'D', 'E']
Upvotes: 2