Reputation: 85
I have a pandas dataframe which has some observations with empty strings which I want to replace with NaN (np.nan
).
I am successfully replacing most of these empty strings using
df.replace(r'\s+',np.nan,regex=True).replace('',np.nan)
But I am still finding empty strings. For example, when I run
sub_df = df[df['OBJECT_COL'] == '']
sub_df.replace(r'\s+', np.nan, regex = True)
print(sub_df['OBJECT_COL'] == '')
The output all returns True
Is there a different method I should be trying? Is there a way to read the encoding of these cells such that perhaps my .replace()
is not effective because the encoding is weird?
Upvotes: 3
Views: 628
Reputation: 8816
Another Alternatives.
sub_df.replace(r'^\s+$', np.nan, regex=True)
OR, to replace an empty string and records with only spaces
sub.df.replace(r'^\s*$', np.nan, regex=True)
Alternative:
using apply()
with function lambda.
sub_df.apply(lambda x: x.str.strip()).replace('', np.nan)
>>> import numpy as np
>>> import pandas as pd
Example DataFrame having empty strings and whitespaces..
>>> sub_df
col_A
0
1
2 somevalue
3 othervalue
4
Best Solution:
1)
>>> sub_df.replace(r'\s+',np.nan,regex=True).replace('',np.nan)
col_A
0 NaN
1 NaN
2 somevalue
3 othervalue
4 NaN
2) This works but partially not for both cases:
>>> sub_df.replace(r'^\s+$', np.nan, regex=True)
col_A
0
1 NaN
2 somevalue
3 othervalue
4 NaN
3) This also works for both conditions.
>>> sub_df.replace(r'^\s*$', np.nan, regex=True)
col_A
0 NaN
1 NaN
2 somevalue
3 othervalue
4 NaN
4) This also works for both conditions.
>>> sub_df.apply(lambda x: x.str.strip()).replace('', np.nan)
col_A
0 NaN
1 NaN
2 somevalue
3 othervalue
4 NaN
Upvotes: 3
Reputation: 164693
pd.Series.replace
does not work in-place by default. You need to specify inplace=True
explicitly:
sub_df.replace(r'\s+', np.nan, regex=True, inplace=True)
Or, alternatively, assign back to sub_df
:
sub_df = sub_df.replace(r'\s+', np.nan, regex=True)
Upvotes: 2
Reputation: 4792
Try np.where:
df['OBJECT_COL'] = np.where(df['OBJECT_COL'] == '', np.nan, df['OBJECT_COL'])
Upvotes: 0