Reputation: 4758
Processing CSV files with csv.DictReader is great - but I have CSV files with comment lines (indicated by a hash at the start of a line), for example:
# step size=1.61853
val0,val1,val2,hybridisation,temp,smattr
0.206895,0.797923,0.202077,0.631199,0.368801,0.311052,0.688948,0.597237,0.402763
-169.32,1,1.61853,2.04069e-92,1,0.000906546,0.999093,0.241356,0.758644,0.202382
# adaptation finished
The csv module doesn't include any way to skip such lines.
I could easily do something hacky, but I imagine there's a nice way to wrap a csv.DictReader
around some other iterator object, which preprocesses to discard the lines.
Upvotes: 95
Views: 42524
Reputation: 2929
based on sigvaldm and Leonid
def is_comment(line):
return line.startswith('#')
def is_whitespace(line):
return line.isspace()
def decomment(csvfile):
for row in csvfile:
if is_comment(row) == False and is_whitespace(row) == False:
yield row
with open('dummy.csv') as csvfile:
reader = csv.reader(decomment(csvfile))
for row in reader:
print(row)
Upvotes: 2
Reputation: 643
Good question. Python's CSV library lacks basic support for comments (not uncommon at the top of CSV files). While Dan Stowell's solution works for the specific case of the OP, it is limited in that #
must appear as the first symbol. A more generic solution would be:
def decomment(csvfile):
for row in csvfile:
raw = row.split('#')[0].strip()
if raw: yield raw
with open('dummy.csv') as csvfile:
reader = csv.reader(decomment(csvfile))
for row in reader:
print(row)
As an example, the following dummy.csv
file:
# comment
# comment
a,b,c # comment
1,2,3
10,20,30
# comment
returns
['a', 'b', 'c']
['1', '2', '3']
['10', '20', '30']
Of course, this works just as well with csv.DictReader()
.
Upvotes: 29
Reputation: 151
Just posting the bugfix from @sigvaldm's solution.
def decomment(csvfile):
for row in csvfile:
raw = row.split('#')[0].strip()
if raw: yield row
with open('dummy.csv') as csvfile:
reader = csv.reader(decomment(csvfile))
for row in reader:
print(row)
A CSV line can contain "#" characters in quoted strings and is perfectly valid. The previous solution was cutting off strings containing '#' characters.
Upvotes: -1
Reputation: 1435
Another way to read a CSV file is using pandas
Here's a sample code:
df = pd.read_csv('test.csv',
sep=',', # field separator
comment='#', # comment
index_col=0, # number or label of index column
skipinitialspace=True,
skip_blank_lines=True,
error_bad_lines=False,
warn_bad_lines=True
).sort_index()
print(df)
df.fillna('no value', inplace=True) # replace NaN with 'no value'
print(df)
For this csv file:
a,b,c,d,e
1,,16,,55#,,65##77
8,77,77,,16#86,18#
#This is a comment
13,19,25,28,82
we will get this output:
b c d e
a
1 NaN 16 NaN 55
8 77.0 77 NaN 16
13 19.0 25 28.0 82
b c d e
a
1 no value 16 no value 55
8 77 77 no value 16
13 19 25 28 82
Upvotes: 15
Reputation: 4758
Actually this works nicely with filter
:
import csv
fp = open('samples.csv')
rdr = csv.DictReader(filter(lambda row: row[0]!='#', fp))
for row in rdr:
print(row)
fp.close()
Upvotes: 124