aiman khalid
aiman khalid

Reputation: 147

how to find a string in the pandas rows where a float should it be?

I have datasets consist of hundred thousand rows and hundreds column. All rows should be in float64 (data type) but a few of the rows recognized as object when I imported the file. I know maybe it contains some words in the rows, but I don't know exactly what the words is, it could be "deleted", "com fail" or whatever string that previous user type in. So how to find the string in the rows where a float should be? so I can remove it and convert it into float datatype

Ex:

df
date         sensor1 sensor2 sensor3 sensor4
01/01/2020    100     65       66      89
01/02/2020     94    deleted   99     100
01/03/2020     89     77      hello    55

the strings maybe easy and can be detected by naked eyes if i have few hundreds row, but if you have million of it, what is the best practice and way to find the string?

Thank you.

Upvotes: 0

Views: 804

Answers (1)

jezrael
jezrael

Reputation: 862511

For convert non numeric to missing values use to_numeric with errors='coerce' in DataFrame.apply for processing all columns:

df1 = df.apply(pd.to_numeric, errors='coerce')
print (df1)
            sensor1  sensor2  sensor3  sensor4
date                                          
01/01/2020      100     65.0     66.0       89
01/02/2020       94      NaN     99.0      100
01/03/2020       89     77.0      NaN       55

For find rows with non numeric (if no missing values in original data) test if missing values after converting at least in one column by DataFrame.isna and DataFrame.any and filtering in boolean indexing:

df2 = df[df.apply(pd.to_numeric, errors='coerce').isna().any(axis=1)]
print (df2)
            sensor1  sensor2 sensor3  sensor4
date                                         
01/02/2020       94  deleted      99      100
01/03/2020       89       77   hello       55

Upvotes: 1

Related Questions