Ananta
Ananta

Reputation: 3711

Tabulate a data frame in R

I wanted to tabulate data so that a factor variable becomes columns and keep value from another variable in cell.

So I tried,

a=rep(1:3,3)
d<-rep(1:3, each=3)
b=rnorm(9)
c=runif(9)
dt<-data.frame(a,d,b,c)

  a d          b         c
1 1 1  0.3819762 0.5199602
2 2 1  0.3896063 0.9144730
3 3 1  2.4356972 0.2888464
4 1 2  1.2697016 0.9831191
5 2 2 -1.9844689 0.2046947
6 3 2  0.3473766 0.4766178
7 1 3 -1.5461235 0.6187189
8 2 3  1.0829027 0.9089551
9 3 3 -0.1305324 0.6326141

I looked for data.table, plyr, reshape2 but could not find what I wanted to do. So, I did the old loop way.

mat<-matrix(NA, nrow=3, ncol=4)


for (i in 1:3){
  mat[i,1]<-i
  for (j in 1:3){
    val=dt[a==i & d==j,3]
    mat[i,j+1]<-val

  }

}



mat
     [,1]      [,2]       [,3]       [,4]
[1,]    1 0.3819762  1.2697016 -1.5461235
[2,]    2 0.3896063 -1.9844689  1.0829027
[3,]    3 2.4356972  0.3473766 -0.1305324

... and it takes forever for big data.

Any better option??

Upvotes: 0

Views: 3405

Answers (3)

Jack Ryan
Jack Ryan

Reputation: 2144

using reshape2

> library(reshape2)
> dcast(dt, a ~ d, value.var = "b")
  a         1          2          3
1 1 0.3819762  1.2697016 -1.5461235
2 2 0.3896063 -1.9844689  1.0829027
3 3 2.4356972  0.3473766 -0.1305324

Upvotes: 1

Frank
Frank

Reputation: 66819

This can be done in base R also:

reshape(dt,timevar="d",idvar="a",drop="c",direction="wide")

For your data, this gives...

  a       b.1        b.2        b.3
1 1 0.3819762  1.2697016 -1.5461235
2 2 0.3896063 -1.9844689  1.0829027
3 3 2.4356972  0.3473766 -0.1305324

Please use set.seed before drawing simulated data, so that it is easier to reproduce.

I don't know that this solution will be fast. Also, to use it in the future, you have to get used to these confusing argument names ("timevar", "idvar", etc.) which probably don't describe what you're actually doing most of the time...

Upvotes: 2

eddi
eddi

Reputation: 49448

Here's a data.table option:

library(data.table)
dt = data.table(dt)

dt[, as.list(b), by = a]

Upvotes: 2

Related Questions