Reputation: 3397
I want to count occurrences of the three factors for each column of mydata, so I thought of the function table
Some data of mydata:
A0AUT A0AYT A0AZT A0B2T A0B3T
100130426 no_change no_change no_change no_change no_change
100133144 no_change no_change down no_change no_change
100134869 no_change no_change no_change no_change no_change
10357 no_change up no_change no_change up
10431 no_change up no_change no_change no_change
136542 no_change up no_change no_change no_change
> str(mydata)
'data.frame': 20531 obs. of 518 variables:
$ A0AUT: Factor w/ 3 levels "down","no_change",..: 2 2 2 2 2 2 2 2 2 2 ...
$ A0AYT: Factor w/ 3 levels "down","no_change",..: 2 2 2 3 3 3 2 2 2 3 ...
$ A0AZT: Factor w/ 3 levels "down","no_change",..: 2 1 2 2 2 2 1 2 2 2 ...
$ A0B2T: Factor w/ 3 levels "down","no_change",..: 2 2 2 2 2 2 1 2 2 2 ...
$ A0B3T: Factor w/ 3 levels "down","no_change",..: 2 2 2 3 2 2 2 2 2 2 ...
$ A0B5T: Factor w/ 3 levels "down","no_change",..: 2 2 2 3 2 2 2 2 2 2 ...
$ A0B7T: Factor w/ 3 levels "down","no_change",..: 2 2 2 2 2 2 1 2 2 2 ...
$ A0B8T: Factor w/ 3 levels "down","no_change",..: 2 1 1 2 3 2 2 2 2 2 ...
$ A0BAT: Factor w/ 3 levels "down","no_change",..: 2 2 2 2 2 2 2 2 2 2 ...
$ A0BCT: Factor w/ 3 levels "down","no_change",..: 2 2 2 2 3 2 2 2 2 2 ...
Now I do:
occurences <- apply(mydata, 1, table)
> occurences[[1]] # 100130426
no_change up
508 10
> occurences[[2]] # 100133144
down no_change up
45 446 27
But I want them as a matrix (or at least I think it is easier to deal with) so I made this:
freq <- sapply(occurences, function(x){
c(x, rep(0, 3 - length(x)))
})
> freq[,1:5]
100130426 100133144 100134869 10357 10431
no_change 508 45 14 3 3
up 10 446 411 330 268
0 27 93 185 247
However as you can see the number of no_change for 100133144 went to the up row!
My expected output would be:
> freq[,1:5]
100130426 100133144 100134869 10357 10431
up 10 45 14 3 3
no_change 508 446 411 330 268
down 0 27 93 185 247
How can I make it so that each value is well placed? As you can see each table may be just one to three elements, so doing:
freq <- matrix(unlist(occurences), nrow=3)
results on error, because not multiple of 3.
I might have taken a bad approach to count the frequencies of mydata by column. I would prefer to have an approach with just base R, without using any library
Upvotes: 1
Views: 893
Reputation: 886938
We can do with table
. Convert the 'data.frame' to 'matrix' and reshape from 'wide' to 'long' (using melt
from reshape2
), and call table
on the concerned columns to get the frequency count.
library(reshape2)
table(melt(as.matrix(mydata))[c(3,1)])
# Var1
#value 10357 10431 136542 100130426 100133144 100134869
# down 0 0 0 0 1 0
# no_change 3 4 4 5 4 5
# up 2 1 1 0 0 0
Or using only base R
, we can just unlist
the data to get a vector
, replicate the 'row names' (using col
) and then call the table
table(unlist(mydata), row.names(mydata)[col(mydata)])
# Var1
#value 10357 10431 136542 100130426 100133144 100134869
# down 0 0 0 0 1 0
# no_change 3 4 4 5 4 5
# up 2 1 1 0 0 0
Another option is dplyr/tidyr
library(dplyr)
library(tidyr)
add_rownames(mydata) %>%
gather(Var, Val,-rowname) %>%
group_by(rowname, Val) %>%
summarise(n=n()) %>%
spread(rowname, n, fill=0)
If the dataset columns are factor
, we can convert it to character
class before doing the unlist
mydata[] <- lapply(mydata, as.character)
If this is based on each row
library(qdapTools)
t(mtabulate(as.data.frame(t(mydata))))
# 100130426 100133144 100134869 10357 10431 136542
#no_change 5 4 5 3 4 4
#down 0 1 0 0 0 0
#up 0 0 0 2 1 1
Or using only base R
, we create a vector of unique elements in the dataset ('nm1' - here it is already known, but if it is not, nm1 <- unique(unlist(lapply(mydata, as.character)))
), then loop over the rows using apply
with MARGIN=1
, use tabulate
after converting the row vector to factor
with levels
specified as 'nm1'. In tabulate
, we can also specify the length of return vector i.e. length of 'nm1'. The output will be a matrix
. We can assign the row names (row.names<-
) as 'nm1'.
nm1 <- c('up', 'no_change', 'down')
`row.names<-`(apply(mydata, 1, function(x)
tabulate(factor(x, levels=nm1),length(nm1))), nm1)
# 100130426 100133144 100134869 10357 10431 136542
#up 0 0 0 2 1 1
#no_change 5 4 5 3 4 4
#down 0 1 0 0 0 0
mydata <- structure(list(A0AUT = c("no_change", "no_change",
"no_change",
"no_change", "no_change", "no_change"), A0AYT = c("no_change",
"no_change", "no_change", "up", "up", "up"), A0AZT = c("no_change",
"down", "no_change", "no_change", "no_change", "no_change"),
A0B2T = c("no_change", "no_change", "no_change", "no_change",
"no_change", "no_change"), A0B3T = c("no_change", "no_change",
"no_change", "up", "no_change", "no_change")),
.Names = c("A0AUT",
"A0AYT", "A0AZT", "A0B2T", "A0B3T"), class = "data.frame",
row.names = c("100130426",
"100133144", "100134869", "10357", "10431", "136542"))
Upvotes: 3
Reputation: 83215
Promoting my comment to an answer:
library(reshape2)
dcast(melt(mydf, id="id"), value + variable ~ id, length)
This supposes that the numbers are an id-variable. If they are stored as rownumbers:
dcast(melt(as.matrix(mydf)), value ~ Var1)
Both give:
value 10357 10431 136542 100130426 100133144 100134869
1 down 0 0 0 0 1 0
2 no_change 3 4 4 5 4 5
3 up 2 1 1 0 0 0
Upvotes: 2