Reputation: 1478
I am using findall to separate text.
I started with this expression re.findall(r'(.?)(\$.?\$)' but it doesn't give me the data after the last piece of text found. I missed the '6\n\n'
How do I get the last piece of text?
Here is my python code:
#!/usr/bin/env python
import re
allData = '''
1
2
3 here Some text in here
$file1.txt$
4 Some text in here and more $file2.txt$
5 Some text $file3.txt$ here
$file3.txt$
6
'''
for record in re.findall(r'(.*?)(\$.*?\$)|(.*?$)',allData,flags=re.DOTALL) :
print repr(record)
The output I get for this is:
('\n1\n2\n3 here Some text in here \n', '$file1.txt$', '')
('\n4 Some text in here and more ', '$file2.txt$', '')
('\n5 Some text ', '$file3.txt$', '')
(' here \n', '$file3.txt$', '')
('', '', '\n6\n')
('', '', '')
('', '', '')
I really would like this output:
('\n1\n2\n3 here Some text in here \n', '$file1.txt$')
('\n4 Some text in here and more ', '$file2.txt$')
('\n5 Some text ', '$file3.txt$')
(' here \n', '$file3.txt$')
('\n6\n', '', )
Background info in case you need to see the larger picture.
I case your are interested, I'm re-writing this in python. I have the rest of the code under control. I am just getting too much stuff out of findall.
https://discussions.apple.com/message/21202021#21202021
Upvotes: 0
Views: 474
Reputation: 25207
Here's one way to solve your substitution problem with findall
.
def readfile(name):
with open(name) as f:
return f.read()
r = re.compile(r"\$(.+?)\$|(\$|[^$]+)")
print "".join(readfile(filename) if filename else text
for filename, text in r.findall(allData))
Upvotes: 1
Reputation: 4903
If I understand correctly from that Apple link you want to do something like:
import re
allData = '''
1
2
3 here Some text in here
$file1.txt$
4 Some text in here and more $file2.txt$
5 Some text $file3.txt$ here
$file3.txt$
6
'''
def read_file(m):
return open(m.group(1)).read()
# Sloppy matching :D
# print re.sub("\$(.*?)\$", read_file, allData)
# More precise.
print re.sub("\$(file\d+?\.txt)\$", read_file, allData)
EDIT As Oscar suggests make match more precise.
ie. take the filename between $s and read the file for the data and that's what the above would do.
Example output:
1
2
3 here Some text in here
I'am file1.txt
4 Some text in here and more
I'am file2.txt
5 Some text
I'am file3.txt
here
I'am file3.txt
6
Files:
==> file1.txt <==
I'am file1.txt
==> file2.txt <==
I'am file2.txt
==> file3.txt <==
I'am file3.txt
Upvotes: 2
Reputation: 23536
To achieve the output you want you need to restrict your pattern to 2 capture groups. (If you use 3 capture groups, you will have 3 elements in every "record").
You could make the second group optional, that should do the job:
r'([^$]*)(\$.*?\$)?'
Upvotes: 1
Reputation: 26333
This one is partly solving your problem
import re
allData = '''
1
2
3 here Some text in here
$file1.txt$
4 Some text in here and more $file2.txt$
5 Some text $file3.txt$ here
$file3.txt$
6
'''
for record in re.findall(r'(.*?)(\$.*?\$)|(.*?$)',allData.strip(),flags=re.DOTALL) :
print [ x for x in record if x]
producing output
['1\n2\n3 here Some text in here \n', '$file1.txt$']
['\n4 Some text in here and more ', '$file2.txt$']
['\n5 Some text ', '$file3.txt$']
[' here \n', '$file3.txt$']
['\n6']
[]
Avoid last empty list with
for record in re.findall(r'(.*?)(\$.*?\$)|(.*?$)',allData.strip(),flags=re.DOTALL) :
if ([ x for x in record if x] != []):
print [ x for x in record if x]
Upvotes: 0