Reputation: 79830
Is there an elegant way of skipping first line of file when using python
fileinput module?
I have data file with nicely formated data but the first line is header. Using fileinput
I would have to include check and discard line if the line does not seem to contain data.
The problem is that it would apply the same check for the rest of the file.
With read()
you can open file, read first line then go to loop over the rest of the file. Is there similar trick with fileinput
?
Is there an elegant way to skip processing of the first line?
Example code:
import fileinput
# how to skip first line elegantly?
for line in fileinput.input(["file.dat"]):
data = proces_line(line);
output(data)
Upvotes: 14
Views: 16469
Reputation: 79
Do two loops where the first one calls break
immediately.
with fileinput.input(files=files, mode='rU', inplace=True) as f:
for line in f:
# add print() here if you only want to empty the line
break
for line in f:
process(line)
Lets say you want to remove or empty all of the first 5 lines.
with fileinput.input(files=files, mode='rU', inplace=True) as f:
for line in f:
# add print() here if you only want to empty the first 5 lines
if f._filelineno == 5:
break
for line in f:
process(line)
But if you only want to get rid of the first line, just use next
before the loop to remove the first line.
with fileinput.input(files=files, mode='rU', inplace=True) as f:
next(f)
for line in f:
process(line)
Upvotes: 0
Reputation: 5149
One option is to use openhook
:
The openhook, when given, must be a function that takes two arguments, filename and mode, and returns an accordingly opened file-like object. You cannot use inplace and openhook together.
One could create helper function skip_header
and use it as openhook, something like:
import fileinput
files = ['file_1', 'file_2']
def skip_header(filename, mode):
f = open(filename, mode)
next(f)
return f
for line in fileinput.input(files=files, openhook=skip_header):
# do something
Upvotes: 0
Reputation: 1039
with open(file) as j: #open file as j
for i in j.readlines()[1:]: #start reading j from second line.
Upvotes: -1
Reputation: 223032
lines = iter(fileinput.input(["file.dat"]))
next(lines) # extract and discard first line
for line in lines:
data = proces_line(line)
output(data)
or use the itertools.islice way if you prefer
import itertools
finput = fileinput.input(["file.dat"])
lines = itertools.islice(finput, 1, None) # cuts off first line
dataset = (process_line(line) for line in lines)
results = [output(data) for data in dataset]
Since everything used are generators and iterators, no intermediate list will be built.
Upvotes: 18
Reputation: 4867
The fileinput
module contains a bunch of handy functions, one of which seems to do exactly what you're looking for:
for line in fileinput.input(["file.dat"]):
if not fileinput.isfirstline():
data = proces_line(line);
output(data)
fileinput module documentation
Upvotes: 16
Reputation: 45344
It's right in the docs: http://docs.python.org/library/fileinput.html#fileinput.isfirstline
Upvotes: 5