user2129623
user2129623

Reputation: 2257

Identifying none value from column

I am reading csv using pandas to perform some analysis on it. Where I am getting this error

ValueError: could not convert string to float: 'none'

I checked, I am getting this error due to shift_zip parameter. I manually went to csv file and openoffce and converted this column to numeric. But still it gives this error.

Data looks like this

enter image description here

I manually checked shift_zip column but can not find none value in it.

I also tried to print this column data and it's data type, which gives <class int>.

for val in data['nurse_zip']:
#     print((val))
    if type(val) != 'int':
        print(type((val)))

output

<class 'int'>
<class 'int'>
<class 'int'>

How to I correctly identify which is none value in this column is causing this issue?

Edit 1: Adding more code for better understanding:

dataset = pd.read_csv("model__newdata.csv",header = 0)


#Data Pre-processing
data = dataset.drop('shift_location_id',1)
data = data.drop('status',1)
data = data.drop('city',1)
data = data.drop('open_positions',1)
# data = data.drop('shift_id',1)
# data = data.drop('role_id',1)
# data = data.drop('specialty_id',1)
# data = data.drop('years_of_experience',1)
# data = data.drop('shifts_zip',1)
# data = data.drop('nurse_zip',1)
# data = data.drop('shift_department_id',1)
# data = data.drop('shift_organization_id',1)
# data = data.drop('user_id',1)


#Find median for features having NaN
median_role_id, median_specialty_id = data['role_id'].median(),data['specialty_id'].median() 
median_shift_id = data['shift_id'].median()
median_specialty_id = data['specialty_id'].median()

data['shift_id'].fillna(median_shift_id, inplace=True)
data['role_id'].fillna(median_role_id, inplace=True)
data['specialty_id'].fillna(median_specialty_id, inplace=True)
data['years_of_experience'].fillna(0, inplace=True)
data['shifts_zip'].fillna(0, inplace=True) #Gives none value error
data['nurse_zip'].fillna(0, inplace=True)
data['shift_department_id'].fillna(0, inplace=True)
data['shift_organization_id'].fillna(0, inplace=True)
data['user_id'].fillna(0, inplace=True)

print (data[data['nurse_zip'] == 'none'])

Output

Empty DataFrame
Columns: [shift_id, user_id, shift_organization_id, shift_department_id, role_id, specialty_id, years_of_experience, nurse_zip, shifts_zip]
Index: []

Edit 1

Result on jezrael answer

It give False or True as per condition. Can not check which particular row is none or empty.

Upvotes: 1

Views: 1181

Answers (2)

jezrael
jezrael

Reputation: 863166

You can try:

#check string none
print (data[data['nurse_zip'] == 'none'])

#check non integer values
print (data[data['nurse_zip'].apply(type) != int])

#check strings values
print (data[data['nurse_zip'].apply(type) == str])

#check missing values values
print (data[data['nurse_zip'].isnull()])

Upvotes: 1

aman nagariya
aman nagariya

Reputation: 172

If finding the Na or null value is the objective then simply use

df.info()

and you will be able to see the datatype of the column as well as the None value count also.

But I think, in your dataset the value which making noise is not in null format. You can give a try to below points. 1:Better you visualize the particular column using historical plot or any other plot.
2:Use df[column].astype to force change the dtype of column

Upvotes: 2

Related Questions