Reputation: 27
I am currently working with database which is generated directly from machine instruments.(PI Data) I am trying to remove information from the database which is mainly related to communication error. For example
df_ex = pd.DataFrame({'Ambient': [20.11112, 'Bad',200.22222],
'Humidity': [50.47585, 89.14251,'Bad Input'],
'Compressor Temp': [50.47585, 89.14251,72.89657],
'Bearing Temp': ['Scan off', 74.45551, 52.2227]})
df_ex
:
Ambient Humidity Compressor Temp Bearing Temp
0 20.11112 50.47585 50.47585 Scan off
1 Bad 89.14251 89.14251 74.45551
2 200.22222 Bad Input 72.89657 52.2227
Errors received with df is filtered out by using below code.
df_ex.replace(['Bad','Bad Input','Scan Off'],np.nan, inplace=True)
df_ex.replace(" ", np.nan, inplace=True)
df_ex.dropna(inplace = True)
I am trying to figure out way on how can I remove errors "Bad", "Bad Input" etc from dataframe by filtering out unknown error. I am filtering errors manually most of the time from excel file.
Due to huge size of data base trying unique value also doesn't help in filtering as there are many unique decimal values.
I used this code to find unique values to find error in the dataframe
df_ex['Ambient'].unique()
Errors are changing time to time depending on network communication failure (Unknown error). So instead of "Bad" it may have "Bad Data" which I need to remove manually. If I fail to find out correct name of error my code doesn't run.
Can some one help on this?
Thank you for your time.
I tried out = df.apply(pd.to_numeric, errors = 'coerce')
which removed errors but changed Date time in to nan.
Upvotes: 0
Views: 509
Reputation: 260640
IIUC, you have a dataframe with numerical data and sometimes string values which denote incorrect data.
You could convert to numeric
with errors="coerce"
to force invalid data to NaN:
out = df_ex.apply(pd.to_numeric, errors='coerce')
output:
Ambient Humidity Compressor Temp Bearing Temp
0 20.11112 50.47585 50.47585 NaN
1 NaN 89.14251 89.14251 74.45551
2 200.22222 NaN 72.89657 52.22270
Upvotes: 1