Reputation: 8578
I have a dataframe column which generally contains numeric but some entries are written as 'No Data'
.
df=pd.DataFrame({'group':[1,2,3,' No Data',4,5,6]})
However, some entries, like in the above example, may start with a whitespace as ' No Data'
. I am trying to find an efficient way to replace all No Data
entries with NaN
without worrying about that extra space. For now, what I have is
df['group'].replace('No Data', np.nan)
print df
Which does not work unless I put a space in search term like
df['group'].replace(' No Data', np.nan)
While this works, I was hoping there is a better way I can do it without checking for extra space.
Note : I can also achieve this by list comprehension
and using in
[np.nan if 'No Data' in str(x) else x for x in df['group']]
which will work just fine but I think replace
is more readable.
Upvotes: 1
Views: 281
Reputation: 11971
You can achieve your desired result by integrating a regex into the replace
function that you suggest:
import pandas as pd
import numpy as np
df=pd.DataFrame({'group':[1,2,3,' No Data',4,5,6]})
df = df.replace(r'No Data', np.nan, regex=True)
print df
Output
group
0 1
1 2
2 3
3 NaN
4 4
5 5
6 6
Using the regex r'No data'
will look for the string 'No data'
anywhere in your cell. If it finds a match, the cell will be replaced with np.nan
.
An alternative would be to use applymap
to apply a simple lambda function to each cell in the dataframe. I think this is a pretty readable solution:
import pandas as pd
import numpy as np
df=pd.DataFrame({'group':[1,2,3,' No Data',4,5,6]})
df = df.applymap(lambda x: np.nan if type(x) == str and 'No Data' in x else x)
print df
Output
group
0 1
1 2
2 3
3 NaN
4 4
5 5
6 6
Upvotes: 1
Reputation: 5223
Pandas allows you to use regular expressions with the replace function so you can do:
df['group'].replace("\s*No Data", np.nan, regex=True)
Upvotes: 1