user_78361084
user_78361084

Reputation: 3898

How do I copy multiple lines?

I have the following file:

this is the first line
and this is the second line
now it is the third line
wow, the fourth line
but now it's the fifth line
etc...
etc...
etc...

Starting from "now it is the third line" to "but now it's the fifth line", how do I copy those three lines (without knowing the line numbers of those lines)? In Perl, you would do something like:

/^now it is/../^but now/

What is the equivalent in Python?

I have (which obviously only grabs 1 of the lines):

regex = re.compile("now it is")
for line in content:
    if regex.match(line):
        print line

Or

reg = re.compile(r"now it is.*but now it.*", re.MULTILINE | re.DOTALL)

matches = reg.search(urllib2.urlopen(url).read())
for match in matches.group():
    print match

This prints:

n
o
w

i
t

i
s

.
.
.

I.e., it returns characters and not the complete line.

Upvotes: 0

Views: 3100

Answers (4)

Tadeck
Tadeck

Reputation: 137310

I think you just need to see the re.MULTILINE flag. Thanks to it, you can perform a similar match and get the text that is combined from the lines you want.

The complete solution involves using re.MULTILINE and re.DOTALL flags, plus a non-greedy regular expression:

>>> text = """this is the first line
and this is the second line
now it is the third line
wow, the fourth line
but now it's the fifth line
etc...
etc...
etc..."""
>>> import re
>>> match = re.search('^(now it is.*?but now.*?)$', text, flags=re.MULTILINE|re.DOTALL)
>>> print match.group()
now it is the third line
wow, the fourth line
but now it's the fifth line

Upvotes: 2

John La Rooy
John La Rooy

Reputation: 304137

You can easily make a generator to do this:

def re_range(f, re_start, re_end):
    for line in f:
        if re_start.match(line):
            yield line
            break
    for line in f:
        yield line
        if re_end.match(line):
            break

And you can call it like this:

import re

re_start = re.compile("now it is")
re_end = re.compile("but now")
with open('in.txt') as f:
    for line in re_range(f, re_start, re_end):
        print line,

Upvotes: 2

Charles Menguy
Charles Menguy

Reputation: 41428

Something like that?

import re
valid = False
for line in open("/path/to/file.txt", "r"):
    if re.compile("now it is").match(line):
        valid = True
    if re.compile("but now").match(line):
        valid = False
    if valid:
        print line

Like this your caching just one line at a time, contrary to using readlines() where you would cache the whole file in memory.

This is assuming the regex patterns are unique in your text block, if this is not the case please give more information regarding exactly how you match the start line and the end line.

In case you just need to check the beginning of the line for a match it's even easier:

valid = False
for line in open("/path/to/file.txt", "r"):
    if line.startswith("now it is"):
        valid = True
    if line.startswith("but now"):
        valid = False
    if valid:
        print line

Upvotes: 1

purpleladydragons
purpleladydragons

Reputation: 1305

f = open("yourfile") #that is, the name of your file with extension in quotes
f = f.readlines()

Now f will be a list of each line in the file. f[0] will be the first line, f[1] the second and so on. To grab the third to fifth line you would use f[2:5]

Upvotes: 1

Related Questions