DBV
DBV

Reputation: 27

filter unknown string from pandas dataframe

I am currently working with database which is generated directly from machine instruments.(PI Data) I am trying to remove information from the database which is mainly related to communication error. For example

df_ex = pd.DataFrame({'Ambient': [20.11112, 'Bad',200.22222],
                  'Humidity': [50.47585, 89.14251,'Bad Input'],
                  'Compressor Temp': [50.47585, 89.14251,72.89657],
                  'Bearing Temp': ['Scan off', 74.45551, 52.2227]})

df_ex:

     Ambient   Humidity  Compressor Temp Bearing Temp
0   20.11112   50.47585         50.47585     Scan off
1        Bad   89.14251         89.14251     74.45551
2  200.22222  Bad Input         72.89657      52.2227

Errors received with df is filtered out by using below code.

df_ex.replace(['Bad','Bad Input','Scan Off'],np.nan, inplace=True)
df_ex.replace(" ", np.nan, inplace=True)
df_ex.dropna(inplace = True)

I am trying to figure out way on how can I remove errors "Bad", "Bad Input" etc from dataframe by filtering out unknown error. I am filtering errors manually most of the time from excel file.

Due to huge size of data base trying unique value also doesn't help in filtering as there are many unique decimal values.

I used this code to find unique values to find error in the dataframe

df_ex['Ambient'].unique()

Errors are changing time to time depending on network communication failure (Unknown error). So instead of "Bad" it may have "Bad Data" which I need to remove manually. If I fail to find out correct name of error my code doesn't run.

Can some one help on this?

Thank you for your time.

I tried out = df.apply(pd.to_numeric, errors = 'coerce') which removed errors but changed Date time in to nan. Before

After

Upvotes: 0

Views: 509

Answers (1)

mozway
mozway

Reputation: 260640

IIUC, you have a dataframe with numerical data and sometimes string values which denote incorrect data.

You could convert to numeric with errors="coerce" to force invalid data to NaN:

out = df_ex.apply(pd.to_numeric, errors='coerce')

output:

     Ambient  Humidity  Compressor Temp  Bearing Temp
0   20.11112  50.47585         50.47585           NaN
1        NaN  89.14251         89.14251      74.45551
2  200.22222       NaN         72.89657      52.22270

Upvotes: 1

Related Questions