Reputation: 1807
I'm having an issue in pandas where all column values (except the first) are returned as NaN when reading a CSV file AND ignoring header comments.
import pandas as pd
start_of_file = [
['# Accession: urn:mavedb:00000040-a-4'],
['# Downloaded (UTC): 2021-11-30 14:12:18.531917'],
['# Licence: CC BY-NC-SA 4.0 (Attribution-NonCommercial-ShareAlike)'],
['# Licence URL: https://creativecommons.org/licenses/by-nc-sa/4.0/'],
['accession', 'hgvs_nt', 'hgvs_splice', 'hgvs_pro', 'score'],
['urn:mavedb:00000040-a-4#1', 'NA', 'NA', 'p.Glu9Phe', '0.007373838825271998'],
]
# Export data frame...
pd.DataFrame(start_of_file).to_csv('test.csv', index=False)
# ... then read data frame while ignoring comments
pd.read_csv('test.csv', comment="#")
0 1 2 3 4
0 accession hgvs_nt hgvs_splice hgvs_pro score
1 urn:mavedb:00000040-a-4 NaN NaN NaN NaN
Upvotes: 0
Views: 1046
Reputation: 379
df = pd.read_csv('test.csv')
df.iloc[[index for index in range(len(df)) if '# ' not in df['0'][index]]]
display(df)
0 1 2 3 4
4 accession hgvs_nt hgvs_splice hgvs_pro score
5 urn:mavedb:00000040-a-4#1 NaN NaN p.Glu9Phe 0.007373838825271998
or
df = pd.read_csv('test.csv')
df.iloc[[index for index in range(len(df)) if '#' != df['0'][index][0]]]
Upvotes: 1