atirvine88
atirvine88

Reputation: 155

How do you add missing values for rows and columns with function "tableby" from R package 'arsenal'?

I am using the package tableby from the arsenal package. I want to see all the missing values that are present for all combinations of two categorical variables (Current Level on vertical axis; Initial Level on horizontal axis). I am using code like this:

mycontrols  <- tableby.control(test=FALSE, numeric.stats=c("Nmiss", "N", "mean", "median", "q1q3"),
                               cat.stats=c("Nmiss2", "countpct"),
                               stats.labels=list(N='Count', median='Median', q1q3='Q1,Q3'))
summary(tableby(factor(Current Level)~`Initial Level`, data = ., control = mycontrols))

I am getting output that looks something like this (ignore numbers- just made up).

Start (N=9000) Beginning (N=10981) Intermediate (N=8499) Almost (N=16846) Final (N=6582) Total (N=51432)
Current Level
- N-Miss 28 40 35 29 0
- Start 8440 (99.3%) 3 (0.0%) 5 (0.1%) 1 (0.0%) 0 (0.0%)
- Beginning 84 (0.4%) 10829 (99.0%) 8 (0.1%) 4 (0.0%) 0 (0.0%)
- Intermediate 66 (0.2%) 58 (0.5%) 8364 (98.8%) 47 (0.3%) 0 (0.0%)
- Almost 5 (0.1%) 48 (0.4%) 71 (0.8%) 16697 (99.3%) 0 (0.0%)
- Final 0 (0.0%) 3 (0.0%) 16 (0.2%) 68 (0.4%) 6582 (100.0%)

But I want output that also that shows the missing values and percentages for the rows that have a current level but are missing an initial level. See below (ignore numbers).

Start (N=9000) Beginning (N=10981) Intermediate (N=8499) Almost (N=16846) Final (N=6582) N-Miss Total (N=51432)
Current Level
- N-Miss 28 40 35 29 0 122
- Start 8440 (99.3%) 3 (0.0%) 5 (0.1%) 1 (0.0%) 0 (0.0%) 8224 (16.5%)
- Beginning 84 (0.4%) 10829 (99.0%) 8 (0.1%) 4 (0.0%) 0 (0.0%) 1075 (21.2%)
- Intermediate 66 (0.2%) 58 (0.5%) 8364 (98.8%) 47 (0.3%) 0 (0.0%) 845 (16.5%)
- Almost 5 (0.1%) 48 (0.4%) 71 (0.8%) 16697 (99.3%) 0 (0.0%) 6824 (32.8%)
- Final 0 (0.0%) 3 (0.0%) 16 (0.2%) 68 (0.4%) 6582 (100.0%) 672 (13.0%)

Does tableby have this functionality to add these missing values?

Upvotes: 0

Views: 653

Answers (1)

Saskia
Saskia

Reputation: 165

I'm not aware of tableby having functionality to add these missing values to your grouping variable intrinsically. A work around could be to replace the NA values in your 'Initial level' variable with string "Missing", which will result in a new column for your table.

# Create some dummy data
my_df <- data.frame(
  initial = sample(c("Beginning", "Intermediate", "Almost", "Final"), 9000, replace = TRUE),
  current = sample(c("Beginning", "Intermediate", "Almost", "Final"), 9000, replace = TRUE)
)

# Add some random NA values to both "initial" and "current"
for (i in 1:200) {
  my_df[sample(nrow(my_df),1),sample(ncol(my_df),1)] <- NA
}

# Replace the NA values with "Missing" in your initial variable
my_df$initial <- ifelse(is.na(my_df$initial), "Missing", my_df$initial)


# Factor both variables to order the columns
my_df$initial <- factor(my_df$initial, levels = c("Beginning", "Intermediate", "Almost", "Final", "Missing"))
my_df$current <- factor(my_df$current, levels = c("Beginning", "Intermediate", "Almost", "Final"))

# Your controls
mycontrols  <- tableby.control(test=FALSE, numeric.stats=c("Nmiss", "N", "mean", "median", "q1q3"),
                               cat.stats=c("Nmiss2", "countpct"),
                               stats.labels=list(N='Count', median='Median', q1q3='Q1,Q3'))

# Table
summary(tableby(initial ~ current, data = my_df, control = mycontrols))

Upvotes: 0

Related Questions