Reputation: 3898
I have the following file:
this is the first line
and this is the second line
now it is the third line
wow, the fourth line
but now it's the fifth line
etc...
etc...
etc...
Starting from "now it is the third line" to "but now it's the fifth line", how do I copy those three lines (without knowing the line numbers of those lines)? In Perl, you would do something like:
/^now it is/../^but now/
What is the equivalent in Python?
I have (which obviously only grabs 1 of the lines):
regex = re.compile("now it is")
for line in content:
if regex.match(line):
print line
Or
reg = re.compile(r"now it is.*but now it.*", re.MULTILINE | re.DOTALL)
matches = reg.search(urllib2.urlopen(url).read())
for match in matches.group():
print match
This prints:
n
o
w
i
t
i
s
.
.
.
I.e., it returns characters and not the complete line.
Upvotes: 0
Views: 3100
Reputation: 137310
I think you just need to see the re.MULTILINE
flag. Thanks to it, you can perform a similar match and get the text that is combined from the lines you want.
The complete solution involves using re.MULTILINE
and re.DOTALL
flags, plus a non-greedy regular expression:
>>> text = """this is the first line
and this is the second line
now it is the third line
wow, the fourth line
but now it's the fifth line
etc...
etc...
etc..."""
>>> import re
>>> match = re.search('^(now it is.*?but now.*?)$', text, flags=re.MULTILINE|re.DOTALL)
>>> print match.group()
now it is the third line
wow, the fourth line
but now it's the fifth line
Upvotes: 2
Reputation: 304137
You can easily make a generator to do this:
def re_range(f, re_start, re_end):
for line in f:
if re_start.match(line):
yield line
break
for line in f:
yield line
if re_end.match(line):
break
And you can call it like this:
import re
re_start = re.compile("now it is")
re_end = re.compile("but now")
with open('in.txt') as f:
for line in re_range(f, re_start, re_end):
print line,
Upvotes: 2
Reputation: 41428
Something like that?
import re
valid = False
for line in open("/path/to/file.txt", "r"):
if re.compile("now it is").match(line):
valid = True
if re.compile("but now").match(line):
valid = False
if valid:
print line
Like this your caching just one line at a time, contrary to using readlines()
where you would cache the whole file in memory.
This is assuming the regex patterns are unique in your text block, if this is not the case please give more information regarding exactly how you match the start line and the end line.
In case you just need to check the beginning of the line for a match it's even easier:
valid = False
for line in open("/path/to/file.txt", "r"):
if line.startswith("now it is"):
valid = True
if line.startswith("but now"):
valid = False
if valid:
print line
Upvotes: 1
Reputation: 1305
f = open("yourfile") #that is, the name of your file with extension in quotes
f = f.readlines()
Now f will be a list of each line in the file. f[0] will be the first line, f[1] the second and so on. To grab the third to fifth line you would use f[2:5]
Upvotes: 1