Aaron González
Aaron González

Reputation: 65

Checking format of CSV using python

I'm currently working on a script to check if a bunch of CSVs have an adequate format for another one to process it. I'm having trouble with some of the assertions It haves to pass. One is to have no missings, for which I tried:

df = pd.read_csv("C:PATH\\test.csv", sep= ',')

def check(self, file):
try:
    assert df.notna().values.any()  
except AssertionError:
    assert False, "  NaN in data"

It does nothing. I tried it on a CSV with NaNs and it didn't raise an error. Then, I also want it with comma separated, but they may pass me a dot and comma separated one. This is my try:

try:
    assert len(df.columns) != 1 
except AssertionError:      
    "Not comma separated"

It responds poorly, sometimes raises the flag, sometimes it doesn't.

Is there something from "assert" I didn't understand or is something else the issue?

Upvotes: 1

Views: 805

Answers (2)

Rajat Jain
Rajat Jain

Reputation: 2032

pd.notna() https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.notna.html

requires you to provide dataframe as argument

Please try using notnull() as below:

df= pd.read_csv("C:PATH\\test.csv", sep= ',')

def check(file):
    try:
        assert file.notnull().values.any() 
    except AssertionError:
        assert False, "  NaN in data"

check(df)

Upvotes: 0

vercelli
vercelli

Reputation: 4767

Pass df as parameter to check(). Also change .any() to .all()

df= pd.read_csv("C:\\PATH\\test.csv", sep= ',')

def check(file):
    try:
        assert file.notna().values.all()  
    except AssertionError:
        assert False, "  NaN in data"

check(df)

Upvotes: 2

Related Questions