Reputation: 11490
We have the DF
df <- data.frame(group=as.factor(rep(c("UP","DOWN"),6)),variables=(rep(c("sex","smoke","sport"),each=4))
,values=as.factor(c(1,1,1,0 ,1,1,0,0, 1,1,1,1)))
group variables values
1 UP sex 1
2 DOWN sex 1
3 UP sex 1
4 DOWN sex 0
5 UP smoke 1
6 DOWN smoke 1
7 UP smoke 0
8 DOWN smoke 0
9 UP sport 1
10 DOWN sport 1
11 UP sport 1
12 DOWN sport 1
>
Now I want to know all counts of all levels
library(plyr)
This command does almost perfectly what I want
count(df, c("variables", "group", "values"))
variables group values freq
1 sex DOWN 0 1
2 sex DOWN 1 1
3 sex UP 1 2
4 smoke DOWN 0 1
5 smoke DOWN 1 1
6 smoke UP 0 1
7 smoke UP 1 1
8 sport DOWN 1 2
9 sport UP 1 2
I also want to count the factor levels that arent observed. Like I did per hand in the following output.
variables group values freq
1 sex DOWN 0 1
2 sex DOWN 1 1
3 sex UP 0 0 <--
4 sex UP 1 2
5 smoke DOWN 0 1
6 smoke DOWN 1 1
7 smoke UP 0 1
8 smoke UP 1 1
9 sport DOWN 0 0 <--
10 sport DOWN 1 2
11 sport UP 0 0 <--
12 sport UP 1 2
How can I achieve the above output?
Upvotes: 0
Views: 111
Reputation: 42564
You can do this also with data.table
with less lines of code:
library(data.table)
dt <- setDT(df)
cj <- CJ(dt$variables, dt$group, dt$values, unique = TRUE)
dt[, .N, keyby = c("variables", "group", "values")][cj][is.na(N), N := 0]
print(dt)
variables group values N
1: sex DOWN 0 1
2: sex DOWN 1 1
3: sex UP 0 0
4: sex UP 1 2
5: smoke DOWN 0 1
6: smoke DOWN 1 1
7: smoke UP 0 1
8: smoke UP 1 1
9: sport DOWN 0 0
10: sport DOWN 1 2
11: sport UP 0 0
12: sport UP 1 2
setDT()
converts a data.frame
to a data.table
by reference, ie, without copying.
CJ()
is a cross join. It forms a data.table
from the cross product of the vectors. Thus, it's the data.table
version of expand.grid
.
The parameter unique = TRUE
is a convenient alternative to wrapping each argument in level()
or unique()
.
The counting by groups is done with dt[, .N, keyby = c("variables", "group", "values")]
:
variables group values N
1: sex DOWN 0 1
2: sex DOWN 1 1
3: sex UP 1 2
4: smoke DOWN 0 1
5: smoke DOWN 1 1
6: smoke UP 0 1
7: smoke UP 1 1
8: sport DOWN 1 2
9: sport UP 1 2
Now, dt[, .N, keyby = c("variables", "group", "values")][cj]
(right) joins the CJ()
result with all possible combinations.
Finally, [is.na(N), N := 0]
replaces all NA
s in column N
by 0
.
Upvotes: 2
Reputation: 51592
Here is one idea. You can replicate the rows based on freq
variable. If the freq is 2 then the replicated row will have a unique rowname which we can target and change its freq
and values
to 0.
df1 <- plyr::count(df, c("variables", "group", "values"))
df2 <- df1[rep(row.names(df1), df1$freq),]
df2$freq[grep('.', row.names(df2), fixed = TRUE)] <- 0
df2$values[df2$freq == 0] <- 0
df2
# variables group values freq
#1 sex DOWN 0 1
#2 sex DOWN 1 1
#3 sex UP 1 2
#3.1 sex UP 0 0
#4 smoke DOWN 0 1
#5 smoke DOWN 1 1
#6 smoke UP 0 1
#7 smoke UP 1 1
#8 sport DOWN 1 2
#8.1 sport DOWN 0 0
#9 sport UP 1 2
#9.1 sport UP 0 0
If you want to reset your rownames then,
row.names(df2) <- NULL
Upvotes: 0
Reputation: 12935
You could also do:
library(plyr)
d1 <- count(df, c("variables", "group", "values"))
d2 <- expand.grid(list(levels(df$variables), levels(df$group), levels(df$values)))
d2$freq <- 0
colnames(d2) <- colnames(d1)
m <- merge(d1, d2, by = c("variables", "group", "values"), all.y = T)[,-5]
m[is.na(m)] <- 0
# variables group values freq.x
# 1 sex DOWN 0 1
# 2 sex DOWN 1 1
# 3 sex UP 0 0
# 4 sex UP 1 2
# 5 smoke DOWN 0 1
# 6 smoke DOWN 1 1
# 7 smoke UP 0 1
# 8 smoke UP 1 1
# 9 sport DOWN 0 0
# 10 sport DOWN 1 2
# 11 sport UP 0 0
# 12 sport UP 1 2
The idea is to make a data frame (named d2
) in which all possible combinations of variables
and group
and values
are generated and then merge it with d1
.
Upvotes: 1