Reputation: 413
I'm trying to take a file with a format like:
# Comments
# More comments
1,foo,bar,1
1,foo,bar,2
21,foo,bar,8
end_of_file
and process it into a list like:
listing = [[1,'foo','bar',1], [1,'foo','bar',2], [21,'foo','bar',8]]
Currently, I'm running something similar to:
listing = []
with open('foo_file.cfg','r') as f:
for line in f:
if line[0].isDigit:
listing.append(line) # i've also tried listing.append([line])
Obviously, I'm ending up with:
[['1,foo,bar,1'],['1,foo,bar,2'],['21,foo,bar,8']]
I know I can split the line
by comma, rebuild a new list, then append the list to listing.
I'm definitely willing to do that if it's the proper way, but I thought their might be something cleaner. I know the csv
module would read the whole file into a proper format, but I'm not sure how it would deal with selectively removing certain data, such as the comments.
Upvotes: 1
Views: 44
Reputation: 107287
One Pythonic approach is to use itertools.dropwhile()
to ignore the first lines that meet a certain condition. Since csv.reader
objects are iterator, this will no longer require reading the whole file once and then looping over the lines again for filtering them out. You can also remove the empty lines simply by checking the validation of the rows (not(x)
in lambda
function.)
import csv
from itertools import dropwhile
with open('test.csv') as f:
reader = dropwhile(lambda x: not(x) or x[0].startswith('#'), csv.reader(f))
# print(list(reader))
# [['1', 'foo', 'bar', '1'], ['1', 'foo', 'bar', '2'], ['21', 'foo', 'bar', '8']]
Upvotes: 1
Reputation: 164613
This is one way with csv
module, which avoids explicitly accounting for some of the repetitive tasks (comma delimiter, new line, etc).
from io import StringIO
import csv
mystr = StringIO("""1,foo,bar,1
1,foo,bar,2
21,foo,bar,8""")
res = []
# replace mystr with open('file.csv', 'r')
with mystr as f:
reader = filter(None, csv.reader(f)) # ignore empty lines
for line in reader:
if line[0].isdigit():
res.append([int(line[0]), line[1], line[2], int(line[3])])
print(res)
[[1, 'foo', 'bar', 1],
[1, 'foo', 'bar', 2],
[21, 'foo', 'bar', 8]]
Upvotes: 2
Reputation: 2424
If the last line is the only one you want to get rid of you could use pandas.read_csv
using either error_bad_lines=False
property or skipfooter=1
If it is necessary to loop through the lines of the file and check which line to import then I would just change the line that you append to the listing
list to
listing.append(line.split(','))
Upvotes: 1
Reputation: 26039
You could do this in similar way without any module:
lst = []
for line in f:
if not line.startswith('#') and line:
lst.append([int(i) if i.isdigit() else i for i in line.split(',')])
print(lst)
# [[1, 'foo', 'bar', 1], [1, 'foo', 'bar', 2], [21, 'foo', 'bar', 8]]
Upvotes: 1