Yoni
Yoni

Reputation: 27

Python: How to check if there is no value in a csv column using a list

I have a CSV file and I want to check for each row if it has one or more values in different columns which I specified in a list. If there is no value in any column it should add up to a counter so I know how many rows are empty. But if it has one value in one column from the list it shouldn't do anything.

The CSV file is like this: csv file

I made the code below but it is returning 0 which is not correct.

import pandas as pd

testfile = 'test1.csv'

df = pd.read_csv(testfile)

column_names = ['Uniprot_acc',
'Uniprot_id',
'Interpro_domain',
'Ensembl_geneid',
'Ensembl_transcriptid',
'SIFT_score',
'SIFT_pred']

counter = 0

for row in df:
    for column_name in column_names:
        if column_name in row:
            if column_name == None:
                counter =+ 1

print(counter)

What I want to know is how many rows don't contain anything. It should check per row for every column in the list if there is no value. And if indeed there is nothing in the row it should count. So in this example it should be 3.

Upvotes: 2

Views: 4824

Answers (1)

jezrael
jezrael

Reputation: 862731

Use:

counter = df[column_names].isnull().all(axis=1).sum()
print (counter)

Sample:

df = pd.DataFrame({
         'A':list('abcdef'),
         'Uniprot_acc':[np.nan,5,4,5,np.nan,4],
         'Uniprot_id':[np.nan,8,9,4,np.nan,np.nan],
         'Interpro_domain':[np.nan,3,np.nan,7,np.nan,0],
         'E':[5,3,np.nan,9,np.nan,4],

})

column_names = ['Uniprot_acc',
                'Uniprot_id',
                'Interpro_domain']

print (df)
   A  Uniprot_acc  Uniprot_id  Interpro_domain    E
0  a          NaN         NaN              NaN  5.0
1  b          5.0         8.0              3.0  3.0
2  c          4.0         9.0              NaN  NaN
3  d          5.0         4.0              7.0  9.0
4  e          NaN         NaN              NaN  NaN
5  f          4.0         NaN              0.0  4.0

counter = df[column_names].isnull().all(axis=1).sum()
print (counter)
2

Explanation:

First filter columns by list:

print (df[column_names])
   Uniprot_acc  Uniprot_id  Interpro_domain
0          NaN         NaN              NaN
1          5.0         8.0              3.0
2          4.0         9.0              NaN
3          5.0         4.0              7.0
4          NaN         NaN              NaN
5          4.0         NaN              0.0

Then check missing values None and NaNs:

print (df[column_names].isnull())
   Uniprot_acc  Uniprot_id  Interpro_domain
0         True        True             True
1        False       False            False
2        False       False             True
3        False       False            False
4         True        True             True
5        False        True            False

Check if all Trues per rows by DataFrame.all:

print (df[column_names].isnull().all(axis=1))
0     True
1    False
2    False
3    False
4     True
5    False
dtype: bool

And last count only Trues by sum.

Upvotes: 3

Related Questions