llrs
llrs

Reputation: 3397

Merge the results of table in R

I want to count occurrences of the three factors for each column of mydata, so I thought of the function table

Some data of mydata:

              A0AUT     A0AYT     A0AZT     A0B2T     A0B3T
100130426 no_change no_change no_change no_change no_change
100133144 no_change no_change      down no_change no_change
100134869 no_change no_change no_change no_change no_change
10357     no_change        up no_change no_change        up
10431     no_change        up no_change no_change no_change
136542    no_change        up no_change no_change no_change
> str(mydata)
'data.frame':   20531 obs. of  518 variables:
 $ A0AUT: Factor w/ 3 levels "down","no_change",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ A0AYT: Factor w/ 3 levels "down","no_change",..: 2 2 2 3 3 3 2 2 2 3 ...
 $ A0AZT: Factor w/ 3 levels "down","no_change",..: 2 1 2 2 2 2 1 2 2 2 ...
 $ A0B2T: Factor w/ 3 levels "down","no_change",..: 2 2 2 2 2 2 1 2 2 2 ...
 $ A0B3T: Factor w/ 3 levels "down","no_change",..: 2 2 2 3 2 2 2 2 2 2 ...
 $ A0B5T: Factor w/ 3 levels "down","no_change",..: 2 2 2 3 2 2 2 2 2 2 ...
 $ A0B7T: Factor w/ 3 levels "down","no_change",..: 2 2 2 2 2 2 1 2 2 2 ...
 $ A0B8T: Factor w/ 3 levels "down","no_change",..: 2 1 1 2 3 2 2 2 2 2 ...
 $ A0BAT: Factor w/ 3 levels "down","no_change",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ A0BCT: Factor w/ 3 levels "down","no_change",..: 2 2 2 2 3 2 2 2 2 2 ...

Now I do:

occurences <- apply(mydata, 1, table)
> occurences[[1]] # 100130426

no_change        up 
      508        10 
> occurences[[2]] # 100133144

     down no_change        up 
       45       446        27 

But I want them as a matrix (or at least I think it is easier to deal with) so I made this:

  freq <- sapply(occurences, function(x){
    c(x, rep(0, 3 - length(x)))
  })

> freq[,1:5]
          100130426 100133144 100134869 10357 10431
no_change       508        45        14     3     3
up               10       446       411   330   268
                  0        27        93   185   247

However as you can see the number of no_change for 100133144 went to the up row!

My expected output would be:

> freq[,1:5]
              100130426 100133144 100134869 10357 10431
    up               10        45        14     3     3
    no_change       508       446       411   330   268
    down              0        27        93   185   247

How can I make it so that each value is well placed? As you can see each table may be just one to three elements, so doing:

freq <- matrix(unlist(occurences), nrow=3)

results on error, because not multiple of 3.

I might have taken a bad approach to count the frequencies of mydata by column. I would prefer to have an approach with just base R, without using any library

Upvotes: 1

Views: 893

Answers (2)

akrun
akrun

Reputation: 886938

We can do with table. Convert the 'data.frame' to 'matrix' and reshape from 'wide' to 'long' (using melt from reshape2), and call table on the concerned columns to get the frequency count.

library(reshape2)
table(melt(as.matrix(mydata))[c(3,1)])
#              Var1
#value       10357 10431 136542 100130426 100133144 100134869
#  down          0     0      0         0         1         0
#  no_change     3     4      4         5         4         5
#  up            2     1      1         0         0         0

Or using only base R, we can just unlist the data to get a vector, replicate the 'row names' (using col) and then call the table

table(unlist(mydata), row.names(mydata)[col(mydata)])
#             Var1
#value       10357 10431 136542 100130426 100133144 100134869
#  down          0     0      0         0         1         0
#  no_change     3     4      4         5         4         5
#  up            2     1      1         0         0         0

Another option is dplyr/tidyr

library(dplyr)
library(tidyr)
add_rownames(mydata) %>%
    gather(Var, Val,-rowname) %>% 
    group_by(rowname, Val) %>%
    summarise(n=n()) %>% 
    spread(rowname, n, fill=0)

Update

If the dataset columns are factor, we can convert it to character class before doing the unlist

mydata[] <- lapply(mydata, as.character)

Update2

If this is based on each row

library(qdapTools)
t(mtabulate(as.data.frame(t(mydata))))
#          100130426 100133144 100134869 10357 10431 136542
#no_change         5         4         5     3     4      4
#down              0         1         0     0     0      0
#up                0         0         0     2     1      1

Or using only base R, we create a vector of unique elements in the dataset ('nm1' - here it is already known, but if it is not, nm1 <- unique(unlist(lapply(mydata, as.character)))), then loop over the rows using apply with MARGIN=1, use tabulate after converting the row vector to factor with levels specified as 'nm1'. In tabulate, we can also specify the length of return vector i.e. length of 'nm1'. The output will be a matrix. We can assign the row names (row.names<-) as 'nm1'.

nm1 <- c('up', 'no_change', 'down')
`row.names<-`(apply(mydata, 1, function(x)
     tabulate(factor(x, levels=nm1),length(nm1))), nm1)
#          100130426 100133144 100134869 10357 10431 136542
#up                0         0         0     2     1      1
#no_change         5         4         5     3     4      4
#down              0         1         0     0     0      0

data

mydata <- structure(list(A0AUT = c("no_change", "no_change", 
"no_change", 
"no_change", "no_change", "no_change"), A0AYT = c("no_change", 
"no_change", "no_change", "up", "up", "up"), A0AZT = c("no_change", 
"down", "no_change", "no_change", "no_change", "no_change"), 
    A0B2T = c("no_change", "no_change", "no_change", "no_change", 
    "no_change", "no_change"), A0B3T = c("no_change", "no_change", 
    "no_change", "up", "no_change", "no_change")),
 .Names = c("A0AUT", 
"A0AYT", "A0AZT", "A0B2T", "A0B3T"), class = "data.frame",
 row.names = c("100130426", 
"100133144", "100134869", "10357", "10431", "136542"))

Upvotes: 3

Jaap
Jaap

Reputation: 83215

Promoting my comment to an answer:

library(reshape2)
dcast(melt(mydf, id="id"), value + variable ~ id, length)

This supposes that the numbers are an id-variable. If they are stored as rownumbers:

dcast(melt(as.matrix(mydf)), value ~ Var1)

Both give:

      value 10357 10431 136542 100130426 100133144 100134869
1      down     0     0      0         0         1         0
2 no_change     3     4      4         5         4         5
3        up     2     1      1         0         0         0

Upvotes: 2

Related Questions