Ashton
Ashton

Reputation: 316

Spliting a file into lines in Python using re.split

I'm trying to split a file with a list comprehension using code similar to:

lines = [x for x in re.split(r"\n+", file.read()) if not re.match(r"com", x)]

However, the lines list always has an empty string as the last element. Does anyone know a way to avoid this (excluding the cludge of putting a pop() afterwards)?

Upvotes: 1

Views: 12189

Answers (4)

si28719e
si28719e

Reputation: 2165

another handy trick, especially when you need the line number, is to use enumerate:


fp = open("myfile.txt", "r")
for n, line in enumerate(fp.readlines()):
    dosomethingwith(n, line)

i only found out about enumerate quite recently but it has come in handy quite a few times since then.

Upvotes: 1

Ryan Ginstrom
Ryan Ginstrom

Reputation: 14121

This should work, and eliminate the regular expressions as well:

all_lines = (line.rstrip()
             for line in open(filename)
             if "com" not in line)
# filter out the empty lines
lines = filter(lambda x : x, all_lines)

Since you're using a list comprehension and not a generator expression (so the whole file gets loaded into memory anyway), here's a shortcut that avoids code to filter out empty lines:

lines = [line
     for line in open(filename).read().splitlines()
     if "com" not in line]

Upvotes: 0

John Fouhy
John Fouhy

Reputation: 42193

Put the regular expression hammer away :-)

  1. You can iterate over a file directly; readlines() is almost obsolete these days.
  2. Read about str.strip() (and its friends, lstrip() and rstrip()).
  3. Don't use file as a variable name. It's bad form, because file is a built-in function.

You can write your code as:

lines = []
f = open(filename)
for line in f:
    if not line.startswith('com'):
        lines.append(line.strip())

If you are still getting blank lines in there, you can add in a test:

lines = []
f = open(filename)
for line in f:
    if line.strip() and not line.startswith('com'):
        lines.append(line.strip())

If you really want it in one line:

lines = [line.strip() for line in open(filename) if line.strip() and not line.startswith('com')]

Finally, if you're on python 2.6, look at the with statement to improve things a little more.

Upvotes: 9

Alex
Alex

Reputation: 4362

lines = file.readlines()

edit: or if you didnt want blank lines in there, you can do

lines = filter(lambda a:(a!='\n'), file.readlines())

edit^2: to remove trailing newines, you can do

lines = [re.sub('\n','',line) for line in filter(lambda a:(a!='\n'), file.readlines())]

Upvotes: 3

Related Questions