Mary
Mary

Reputation: 1142

Replacing blank spaces with null value for a subset of data frame

For the following data frame,

 id  words   A   B   C   D  E  
 1   new a   1       1   
 2   good v  1  
 3   star c          1
 4   never                  
 5   final   

I tried to replace blank space with null values using the following code:

df1.loc[:, ["A", "B", "C", "E", "D" ]].replace (r'\s+', np.nan, regex = True, inplace = True)

But it did not work. I also tried this code:

df1[["A", "B", "C", "E", "D" ]].replace (r'\s+', np.nan, regex = True, inplace = True)

It also did not work.

But using the following code, it worked:

df1.A.replace (r'\s+', np.nan, regex = True, inplace = True)
df1.B.replace (r'\s+', np.nan, regex = True, inplace = True)
df1.C.replace (r'\s+', np.nan, regex = True, inplace = True)
df1.D.replace (r'\s+', np.nan, regex = True, inplace = True)
df1.E.replace (r'\s+', np.nan, regex = True, inplace = True)

Does any one know why ? Thanks.

Upvotes: 1

Views: 3893

Answers (2)

cfort
cfort

Reputation: 2776

The answer by @ayhan is much better, but I offer this as a quick-and-dirty method of just replacing a bunch of blanks with NaNs:

df1.replace('', np.NaN, inplace=True)

Upvotes: 1

user2285236
user2285236

Reputation:

When you select columns from a DataFrame, the returning object is a copy. If you call a method on that copy, the inplace argument will work on the copy - not on the actual DataFrame.

df1.loc[:, ["A", "B", "C", "E", "D" ]].replace (r'\s+', np.nan, regex = True, inplace = True)

This line actually modifies a DataFrame but since that DataFrame is not assigned to anything, you don't see the result.

With a sample DataFrame:

df = pd.DataFrame()
df['words'] = ['x', 'y', 'z', 't']
df['A'] = [1, 1, '', '']
df['B'] = ['', '', '', '']
df['C'] = [1, '', 1, '']
df['D'] = ['', '   ', '     ', ' ']
df['E'] = ['    ', ' ', '', '']

df
Out: 
  words  A B  C      D     E
0     x  1    1             
1     y  1                  
2     z       1             
3     t                     

You need to assign the result back:

cols = ["A", "B", "C", "E", "D" ]   
df.loc[:, cols] = df.loc[:, cols].replace (r'\s+', np.nan, regex=True)

Note that this will replace only the cells with 1 or more spaces. If you need to replace empty strings too, change that to

df.loc[:, cols] = df.loc[:, cols].replace (r'\s*', np.nan, regex=True)

df
Out: 
  words    A    B    C    D    E
0     x    1  NaN    1  NaN  NaN
1     y    1  NaN  NaN  NaN  NaN
2     z  NaN  NaN    1  NaN  NaN
3     t  NaN  NaN  NaN  NaN  NaN

Upvotes: 4

Related Questions