Reputation: 455
I am working on a data frame and try to find the number of '?' in it. Some part of CSV:
age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,class
25, Private,226802, 11th,7, Never-married, Machine-op-inspct, Own-child, Black, Male,0,0,40, United-States, <=50K
38, Private,89814, HS-grad,9, Married-civ-spouse, Farming-fishing, Husband, White, Male,0,0,50, United-States, <=50K
28, Local-gov,336951, Assoc-acdm,12, Married-civ-spouse, Protective-serv, Husband, White, Male,0,0,40, United-States, >50K
44, Private,160323, Some-college,10, Married-civ-spouse, Machine-op-inspct, Husband, Black, Male,7688,0,40, United-States, >50K
18, ?,103497, Some-college,10, Never-married, ?, Own-child, White, Female,0,0,30, United-States, <=50K
34, Private,198693, 10th,6, Never-married, Other-service, Not-in-family, White, Male,0,0,30, United-States, <=50K
29, ?,227026, HS-grad,9, Never-married, ?, Unmarried, Black, Male,0,0,40, United-States, <=50K
I am using
df.isin(['?']).sum(axis=0)
but it returns 0 for all columns, although there are '?' in the data.
How do I fix it?
Thanks
Upvotes: 0
Views: 161
Reputation: 15488
Here is a extra space before the values:
df.isin([' ?']).sum(axis=0)
One thing is you can strip the values ;-)
df['workclass'].str.strip().isin(['?'])
Upvotes: 2
Reputation: 6181
The problem is that in this table the ?
appears with additional space. Try -
df.isin([' ?']).sum(axis=0)
On general I would suggest formatting the relevant columns beforehand.
You can see the extra space when examining - df.iloc[6]['occupation']
Upvotes: 2