Automatic num-to-char conversion in R using apply and data.table

Question

I'd like to calculate the mean difference of two columns of my data.frame, grouping by a third.

apply doesn't even let me compute any arithmetic operation without explicit conversion of already-numeric columns.
data.table makes the operation and grouping but returns a character vector.
dplyr syntax returns numeric values correctly.

Why does apply() convert numeric vectors to character? Why does data.table convert the results to char?

library(dplyr); library(data.table)
a <- letters[c(1,1:9)]
b <- (1:10)/10
c <- sin(1:10)
dat <- data.frame(a,b,c)
table(dat$a)
typeof(dat$b) #double
dat$bb <- apply(dat, 1,function(x) x["b"])
typeof(dat$bb) #character
dat$bb <- apply(dat, 1,function(x) x["b"]-x["c"])
# Error in x["b"] - x["c"] : non-numeric argument to binary operator
tidydat <- dat %>% group_by(a) %>% summarise(diffr = mean(b-c))
typeof(tidydat$diffr) #double
dt <- data.table(dat)
dt[,bb:=mean(b-c), by=a]
typeof(dt$bb) #character

> dt$bb
 [1] "-0.725384205816789" "-0.725384205816789" "0.158879991940133"  "1.15680249530793"   "1.45892427466314"  
 [6] "0.879415498198926"  "0.0430134012812109" "-0.189358246623382" "0.487881514758243"  "1.54402111088937"  
> tidydat$diffr
[1] -0.7253842  0.1588800  1.1568025  1.4589243  0.8794155  0.0430134 -0.1893582  0.4878815  1.5440211

EDIT this data.table part is untrue, I was just modifying by reference an already existing char column, from @Akrun

akrun · Accepted Answer

Using apply, convert the dataset from data.frame to matrix

> is.matrix(apply(dat, 1, I))
[1] TRUE

and matrix can have only a single class i.e. if there is a character element, it converts the whole data into character. Instead use lapply (if it is columnwise) or may also subset the numeric columns before doing the apply

out <- apply(dat[-1], 1,function(x) x["b"]-x["c"])

-output

> out
 [1] -0.7414710 -0.7092974  0.1588800  1.1568025  1.4589243  0.8794155  0.0430134 -0.1893582  0.4878815  1.5440211
> str(out)
 num [1:10] -0.741 -0.709 0.159 1.157 1.459 ...

The reason for change in behavior is that vector element have only a single class and in data.frame/data.table/tibble etc, the columns are the list elements and not rows i.e. class is specific to a column and not a row

Regarding the data.table case

> library(data.table)
> dt <- as.data.table(dat)
> dt$bb <- NULL # in case if the character column was already created
> dt[,bb:=mean(b-c), by=a]
> str(dt)
Classes ‘data.table’ and 'data.frame':  10 obs. of  4 variables:
 $ a : chr  "A" "A" "B" "C" ...
 $ b : num  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
 $ c : num  0.841 0.909 0.141 -0.757 -0.959 ...
 $ bb: num  -0.725 -0.725 0.159 1.157 0.704 ...

Automatic num-to-char conversion in R using apply and data.table

Answers (2)

Related Questions