irahorecka
irahorecka

Reputation: 1807

Pandas returns NaN values when using read_csv and ignoring comments

I'm having an issue in pandas where all column values (except the first) are returned as NaN when reading a CSV file AND ignoring header comments.

import pandas as pd

start_of_file = [
['# Accession: urn:mavedb:00000040-a-4'],
['# Downloaded (UTC): 2021-11-30 14:12:18.531917'],
['# Licence: CC BY-NC-SA 4.0 (Attribution-NonCommercial-ShareAlike)'],
['# Licence URL: https://creativecommons.org/licenses/by-nc-sa/4.0/'],
['accession', 'hgvs_nt', 'hgvs_splice', 'hgvs_pro', 'score'],
['urn:mavedb:00000040-a-4#1', 'NA', 'NA', 'p.Glu9Phe', '0.007373838825271998'],
]

# Export data frame...
pd.DataFrame(start_of_file).to_csv('test.csv', index=False)

# ... then read data frame while ignoring comments
pd.read_csv('test.csv', comment="#")
                         0        1            2         3      4
0                accession  hgvs_nt  hgvs_splice  hgvs_pro  score
1  urn:mavedb:00000040-a-4      NaN          NaN       NaN    NaN

Upvotes: 0

Views: 1046

Answers (1)

Youssef_boughanmi
Youssef_boughanmi

Reputation: 379

df = pd.read_csv('test.csv')
df.iloc[[index for index in range(len(df)) if '# ' not in  df['0'][index]]]
display(df)

    0   1   2   3   4
4   accession   hgvs_nt hgvs_splice hgvs_pro    score
5   urn:mavedb:00000040-a-4#1   NaN NaN p.Glu9Phe   0.007373838825271998

or

df = pd.read_csv('test.csv')
df.iloc[[index for index in range(len(df)) if '#' !=  df['0'][index][0]]]

Upvotes: 1

Related Questions