user1471980
user1471980

Reputation: 10626

How do you replace empty cells with 0?

I need to replace empty cells with zero (0) in R. I have a data frame like this:

dput(df)

structure(list(CHANNEL = structure(c(1L, 1L, 1L), .Label = "Native BlackBerry App", class = "factor"), 
    DATE = structure(c(1L, 1L, 1L), .Label = "01/01/2011", class = "factor"), 
    HOUR = structure(c(3L, 1L, 2L), .Label = c("1:00am-2:00am", 
    "2:00am-3:00am", "Midnight-1:00am"), class = "factor"), UNIQUE_USERS = structure(c(1L, 
    1L, 1L), .Label = "", class = "factor"), LOGON_VOLUME = structure(c(1L, 
    1L, 1L), .Label = "", class = "factor")), .Names = c("CHANNEL", 
"DATE", "HOUR", "UNIQUE_USERS", "LOGON_VOLUME"), row.names = c(NA, 
-3L), class = "data.frame")

I have this function:

sapply(df, function (x) 
     as.numeric(gsub("(^ +)|( +$)", "0", x))) 

I get these errors, not working.

[ reached getOption("max.print") -- omitted 422793 rows ]
Warning messages:
1: In FUN(X[[4L]], ...) : NAs introduced by coercion
2: In FUN(X[[4L]], ...) : NAs introduced by coercion
3: In FUN(X[[4L]], ...) : NAs introduced by coercion
4: In FUN(X[[4L]], ...) : NAs introduced by coercion

update: when I apply this function to df:

sapply(df, function (x) gsub("(^ +)|( +$)", "0", x) )

I get this:

  CHANNEL                 DATE         HOUR              UNIQUE_USERS LOGON_VOLUME
[1,] "Native BlackBerry App" "01/01/2011" "Midnight-1:00am" ""           ""          
[2,] "Native BlackBerry App" "01/01/2011" "1:00am-2:00am"   ""           ""          
[3,] "Native BlackBerry App" "01/01/2011" "2:00am-3:00am"   ""           ""  

Upvotes: 1

Views: 7708

Answers (1)

Simon O'Hanlon
Simon O'Hanlon

Reputation: 59970

You define an anonymous function in sapply then never use the argument to the function.

sapply(df, function (x) gsub("(^ +)|( +$)", "0", x) ) #===> change df to x

You also coerce everything to a numeric value resulting in NA values for strings with non digits in. Since each column of the data.frame is an atomic vector it can only contain one type of data. The common data type for all elements is therefore character.

Perhaps you meant to do this...

sapply( df , gsub , pattern = "^\\s*$" , replacement = 0 )

     CHANNEL                 DATE         HOUR              UNIQUE_USERS LOGON_VOLUME
[1,] "Native BlackBerry App" "01/01/2011" "Midnight-1:00am" "0"          "0"         
[2,] "Native BlackBerry App" "01/01/2011" "1:00am-2:00am"   "0"          "0"         
[3,] "Native BlackBerry App" "01/01/2011" "2:00am-3:00am"   "0"          "0"  

Using gsub you'll have to convert to an integer afterwards and you will also get NA for any column which contains something other than a character representation of a number. If you need to change entire columns you could check if the entire column is empty and replace with zero if it is. You can't have character elements and numeric elements in the same column.

len <- colSums( sapply( df , grepl , pattern = "^\\s*$" ) )    
df[ , len > 0 ] <- rep( 0 , nrow(df) )
#                CHANNEL       DATE            HOUR UNIQUE_USERS LOGON_VOLUME
#1 Native BlackBerry App 01/01/2011 Midnight-1:00am            0            0
#2 Native BlackBerry App 01/01/2011   1:00am-2:00am            0            0
#3 Native BlackBerry App 01/01/2011   2:00am-3:00am            0            0

Upvotes: 4

Related Questions