Reputation: 10626
I need to replace empty cells with zero (0) in R. I have a data frame like this:
dput(df)
structure(list(CHANNEL = structure(c(1L, 1L, 1L), .Label = "Native BlackBerry App", class = "factor"),
DATE = structure(c(1L, 1L, 1L), .Label = "01/01/2011", class = "factor"),
HOUR = structure(c(3L, 1L, 2L), .Label = c("1:00am-2:00am",
"2:00am-3:00am", "Midnight-1:00am"), class = "factor"), UNIQUE_USERS = structure(c(1L,
1L, 1L), .Label = "", class = "factor"), LOGON_VOLUME = structure(c(1L,
1L, 1L), .Label = "", class = "factor")), .Names = c("CHANNEL",
"DATE", "HOUR", "UNIQUE_USERS", "LOGON_VOLUME"), row.names = c(NA,
-3L), class = "data.frame")
I have this function:
sapply(df, function (x)
as.numeric(gsub("(^ +)|( +$)", "0", x)))
I get these errors, not working.
[ reached getOption("max.print") -- omitted 422793 rows ]
Warning messages:
1: In FUN(X[[4L]], ...) : NAs introduced by coercion
2: In FUN(X[[4L]], ...) : NAs introduced by coercion
3: In FUN(X[[4L]], ...) : NAs introduced by coercion
4: In FUN(X[[4L]], ...) : NAs introduced by coercion
update: when I apply this function to df:
sapply(df, function (x) gsub("(^ +)|( +$)", "0", x) )
I get this:
CHANNEL DATE HOUR UNIQUE_USERS LOGON_VOLUME
[1,] "Native BlackBerry App" "01/01/2011" "Midnight-1:00am" "" ""
[2,] "Native BlackBerry App" "01/01/2011" "1:00am-2:00am" "" ""
[3,] "Native BlackBerry App" "01/01/2011" "2:00am-3:00am" "" ""
Upvotes: 1
Views: 7708
Reputation: 59970
You define an anonymous function in sapply
then never use the argument to the function.
sapply(df, function (x) gsub("(^ +)|( +$)", "0", x) ) #===> change df to x
You also coerce everything to a numeric value resulting in NA
values for strings with non digits in. Since each column of the data.frame
is an atomic vector it can only contain one type of data. The common data type for all elements is therefore character.
Perhaps you meant to do this...
sapply( df , gsub , pattern = "^\\s*$" , replacement = 0 )
CHANNEL DATE HOUR UNIQUE_USERS LOGON_VOLUME
[1,] "Native BlackBerry App" "01/01/2011" "Midnight-1:00am" "0" "0"
[2,] "Native BlackBerry App" "01/01/2011" "1:00am-2:00am" "0" "0"
[3,] "Native BlackBerry App" "01/01/2011" "2:00am-3:00am" "0" "0"
Using gsub
you'll have to convert to an integer afterwards and you will also get NA
for any column which contains something other than a character representation of a number. If you need to change entire columns you could check if the entire column is empty and replace with zero if it is. You can't have character elements and numeric elements in the same column.
len <- colSums( sapply( df , grepl , pattern = "^\\s*$" ) )
df[ , len > 0 ] <- rep( 0 , nrow(df) )
# CHANNEL DATE HOUR UNIQUE_USERS LOGON_VOLUME
#1 Native BlackBerry App 01/01/2011 Midnight-1:00am 0 0
#2 Native BlackBerry App 01/01/2011 1:00am-2:00am 0 0
#3 Native BlackBerry App 01/01/2011 2:00am-3:00am 0 0
Upvotes: 4