T. Veiga
T. Veiga

Reputation: 191

How to replace cells with only a space (" ") in R

I am trying to replace cells with only a space (" ") in R but for some reason it is not working. My vector is something like this:

[1] "SICREDI N/NE"            "SICOOB CREDIMINAS"       "UNICRED SC/PR"          
[4] " "                       " "                       "CRESOL  SC/RS"          

I tried to use CENTRAL<-gsub("\\\b \\\b", NA,CENTRAL) but then it returned:

[1] NA              NA              NA              NA              NA             
[6] "CRESOL  SC/RS" NA              NA              NA              NA 

Upvotes: 0

Views: 159

Answers (2)

Benjamin
Benjamin

Reputation: 17279

A faster approach might be (Gabriel beat me to it):

x <- c("SICREDI N/NE", "SICOOB CREDIMINAS", "UNICRED SC/PR",
       " ", " ", "CRESOL SC/RS")
x[x == " "] <- NA

what you are doing with regular expressions works, but is quite a bit slower (measured in milliseconds over 40,000 elements)

x <- rep(c("SICREDI N/NE", "SICOOB CREDIMINAS", "UNICRED SC/PR",
       " ", " ", "CRESOL SC/RS"), 10000)

y <- rep(c("SICREDI N/NE", "SICOOB CREDIMINAS", "UNICRED SC/PR",
       " ", " ", "CRESOL SC/RS"), 10000)

z <- rep(c("SICREDI N/NE", "SICOOB CREDIMINAS", "UNICRED SC/PR",
           " ", " ", "CRESOL SC/RS"), 10000)

library(microbenchmark)
microbenchmark(
  first = {x[x == " "] <- NA},
  second = {y[grepl("^\\b \\b$", y)] <- NA},
  sub = gsub("^\\b \\b$", NA, z)
)

Unit: milliseconds
   expr       min        lq      mean    median        uq       max neval cld
  first  1.223415  1.231626  1.367973  1.235438  1.247461  2.896081   100 a  
 second  5.633810  5.681902  5.929447  5.697737  5.742457  8.063632   100  b 
    sub 16.960371 17.223557 17.345403 17.271795 17.308452 18.919242   100   c

As a matter of opinion, I find x[x == " "] <- NA much easier to read than either of the regex approaches.

If you want an slight improvement on speed, you can use x[x %in% " "] <- NA, which is more efficient than ==, but only barely.

(and now I have officially spent too much time exploring this :) )

Upvotes: 2

gfgm
gfgm

Reputation: 3647

There are spaces inside your words, so gsub is inserting an NA which results in an NA value in the whole entry. You can do it like this:

vec <- c("words with spaces", "word with spaces", " ", " ", "not", "here")
vec

[1] "words with spaces"
[2] "word with spaces" 
[3] " "                
[4] " "                
[5] "not"              
[6] "here"    


vec[vec==" "]
[1] " " " "

vec[vec==" "] <- NA
vec
[1] "words with spaces"
[2] "word with spaces" 
[3] NA                 
[4] NA                 
[5] "not"              
[6] "here"

Upvotes: 4

Related Questions