Reputation: 155
I am using the package tableby from the arsenal package. I want to see all the missing values that are present for all combinations of two categorical variables (Current Level on vertical axis; Initial Level on horizontal axis). I am using code like this:
mycontrols <- tableby.control(test=FALSE, numeric.stats=c("Nmiss", "N", "mean", "median", "q1q3"),
cat.stats=c("Nmiss2", "countpct"),
stats.labels=list(N='Count', median='Median', q1q3='Q1,Q3'))
summary(tableby(factor(Current Level)~`Initial Level`, data = ., control = mycontrols))
I am getting output that looks something like this (ignore numbers- just made up).
Start (N=9000) | Beginning (N=10981) | Intermediate (N=8499) | Almost (N=16846) | Final (N=6582) | Total (N=51432) |
---|---|---|---|---|---|
Current Level | |||||
- N-Miss | 28 | 40 | 35 | 29 | 0 |
- Start | 8440 (99.3%) | 3 (0.0%) | 5 (0.1%) | 1 (0.0%) | 0 (0.0%) |
- Beginning | 84 (0.4%) | 10829 (99.0%) | 8 (0.1%) | 4 (0.0%) | 0 (0.0%) |
- Intermediate | 66 (0.2%) | 58 (0.5%) | 8364 (98.8%) | 47 (0.3%) | 0 (0.0%) |
- Almost | 5 (0.1%) | 48 (0.4%) | 71 (0.8%) | 16697 (99.3%) | 0 (0.0%) |
- Final | 0 (0.0%) | 3 (0.0%) | 16 (0.2%) | 68 (0.4%) | 6582 (100.0%) |
But I want output that also that shows the missing values and percentages for the rows that have a current level but are missing an initial level. See below (ignore numbers).
Start (N=9000) | Beginning (N=10981) | Intermediate (N=8499) | Almost (N=16846) | Final (N=6582) | N-Miss | Total (N=51432) |
---|---|---|---|---|---|---|
Current Level | ||||||
- N-Miss | 28 | 40 | 35 | 29 | 0 | 122 |
- Start | 8440 (99.3%) | 3 (0.0%) | 5 (0.1%) | 1 (0.0%) | 0 (0.0%) | 8224 (16.5%) |
- Beginning | 84 (0.4%) | 10829 (99.0%) | 8 (0.1%) | 4 (0.0%) | 0 (0.0%) | 1075 (21.2%) |
- Intermediate | 66 (0.2%) | 58 (0.5%) | 8364 (98.8%) | 47 (0.3%) | 0 (0.0%) | 845 (16.5%) |
- Almost | 5 (0.1%) | 48 (0.4%) | 71 (0.8%) | 16697 (99.3%) | 0 (0.0%) | 6824 (32.8%) |
- Final | 0 (0.0%) | 3 (0.0%) | 16 (0.2%) | 68 (0.4%) | 6582 (100.0%) | 672 (13.0%) |
Does tableby have this functionality to add these missing values?
Upvotes: 0
Views: 653
Reputation: 165
I'm not aware of tableby
having functionality to add these missing values to your grouping variable intrinsically. A work around could be to replace the NA values in your 'Initial level' variable with string "Missing", which will result in a new column for your table.
# Create some dummy data
my_df <- data.frame(
initial = sample(c("Beginning", "Intermediate", "Almost", "Final"), 9000, replace = TRUE),
current = sample(c("Beginning", "Intermediate", "Almost", "Final"), 9000, replace = TRUE)
)
# Add some random NA values to both "initial" and "current"
for (i in 1:200) {
my_df[sample(nrow(my_df),1),sample(ncol(my_df),1)] <- NA
}
# Replace the NA values with "Missing" in your initial variable
my_df$initial <- ifelse(is.na(my_df$initial), "Missing", my_df$initial)
# Factor both variables to order the columns
my_df$initial <- factor(my_df$initial, levels = c("Beginning", "Intermediate", "Almost", "Final", "Missing"))
my_df$current <- factor(my_df$current, levels = c("Beginning", "Intermediate", "Almost", "Final"))
# Your controls
mycontrols <- tableby.control(test=FALSE, numeric.stats=c("Nmiss", "N", "mean", "median", "q1q3"),
cat.stats=c("Nmiss2", "countpct"),
stats.labels=list(N='Count', median='Median', q1q3='Q1,Q3'))
# Table
summary(tableby(initial ~ current, data = my_df, control = mycontrols))
Upvotes: 0