Coder
Coder

Reputation: 455

Why "df.isin" does not work with my data?

I am working on a data frame and try to find the number of '?' in it. Some part of CSV:

age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,class
25, Private,226802, 11th,7, Never-married, Machine-op-inspct, Own-child, Black, Male,0,0,40, United-States, <=50K
38, Private,89814, HS-grad,9, Married-civ-spouse, Farming-fishing, Husband, White, Male,0,0,50, United-States, <=50K
28, Local-gov,336951, Assoc-acdm,12, Married-civ-spouse, Protective-serv, Husband, White, Male,0,0,40, United-States, >50K
44, Private,160323, Some-college,10, Married-civ-spouse, Machine-op-inspct, Husband, Black, Male,7688,0,40, United-States, >50K
18, ?,103497, Some-college,10, Never-married, ?, Own-child, White, Female,0,0,30, United-States, <=50K
34, Private,198693, 10th,6, Never-married, Other-service, Not-in-family, White, Male,0,0,30, United-States, <=50K
29, ?,227026, HS-grad,9, Never-married, ?, Unmarried, Black, Male,0,0,40, United-States, <=50K

I am using

df.isin(['?']).sum(axis=0)

but it returns 0 for all columns, although there are '?' in the data.

How do I fix it?

Thanks

Upvotes: 0

Views: 161

Answers (2)

wasif
wasif

Reputation: 15488

Here is a extra space before the values:

df.isin([' ?']).sum(axis=0)

One thing is you can strip the values ;-)

df['workclass'].str.strip().isin(['?'])

Upvotes: 2

Tom Ron
Tom Ron

Reputation: 6181

The problem is that in this table the ? appears with additional space. Try -

df.isin([' ?']).sum(axis=0)

On general I would suggest formatting the relevant columns beforehand.

You can see the extra space when examining - df.iloc[6]['occupation']

Upvotes: 2

Related Questions