DavidF
DavidF

Reputation: 91

Using functions in j with by - basic question

I need to understand why sometimes you need to use a "by", and sometimes you don't. I'm really new to both R and data.table, so it is probably something basic.

a<-c("A","B","C")
b<-c("AA","BBB","CCC")
x1<-c(2,4,8)
x2<-c(2,4,1)
n1<-c(9,9,9)
n2<-c(10,10,10)

DT <-data.table(a,b,x1,x2,n1,n2)

test1 <- DT[,.(y=nchar(b))]
test2 <- DT[,.(pv1=prop.test(c(x1,x2), c(n1,n2))$p.value)]
test3 <- DT[,.(pv1=prop.test(c(x1,x2), c(n1,n2))$p.value), by= 'a']

test1 behaves as I expected, it returns a data table with 3 observations and 1 variable.

test2 confused me. I get get only 1 observation back

test3 is how I got the answer I expected.

I don't understand why test2 did not operate row-wise like test1 did. When do you need to use a by= if you want to process every row in the table?

Thanks for your help,

David

Upvotes: 1

Views: 55

Answers (1)

tofd
tofd

Reputation: 620

It does operate row-wise. It's just that, while nchar() takes a vector as its argument and returns a vector, functions like prop.test(), sum(), mean() etc. take a vector (or vectors) and return a single value. Thus, without a 'by' argument, the function will operate across the whole data table (no sub-groupings) and return a single value.

Upvotes: 3

Related Questions