Descriptive Statistic for Multilevel (clustered) Data

Question

I am having trouble generating complex cross-sections of descriptive statistics for data that are multilevel in nature. I have tried to go at this from a couple of different angles, but to no avail. Below please find some code I used for a plyr solution that failed. The issue is that Schools exist within a District. I need the summary statistics for the District level to match every school in that district. The plyr solution obviously only generates descriptive statistics at the district level for each sub-sample of school vs. applying the aggregate district information to each school.

I've been trying to find a way around this for several days when I have a moment.

Would by, aggregate, data.table offer any better solutions?

#Generate Data
set.seed(500)
School <- rep(seq(1:20), 2)
District <- rep(c(rep("East", 10), rep("West", 10)), 2)
Score <- rnorm(40, 100, 15)
Student.ID <- sample(1:1000,8,replace=T)
items <- data.frame(replicate(10, sample(1:4, 40, replace=TRUE)))
gender <- rep( c("Male","Female"), 100*c(0.4,0.6) )  
gender <- sample(gender, 40)
low.inc <- rep( c("Status.A", "Status.B", "Status.c"), 100*c(0.3,0.2,0.5) )  
low.inc <- sample(low.inc, 40)
items <- data.frame(lapply(items, factor, ordered=TRUE, 
                           levels=1:4))
                           labels=c("Strongly disagree","Disagree",
                                    "Agree","Strongly Agree")
school.data <- data.frame(Student.ID, School, District, Score, items, gender, low.inc)
sd1 = sd(school.data$Score)
m1 = mean(school.data$Score)
sd.above = m1 + sd1
sd.below = m1 - sd1
school.data$scorecat[Score >= sd.above] <- "High"
school.data$scorecat[Score > sd.below & Score <= sd.above] <- "Moderate"
school.data$scorecat[Score <= sd.below] <- "Low"

#Attempt to generate table
library(plyr)
b1 <- ddply(school.data, .var = c("gender", "District", "School"), .fun = summarise,
  n = length(scorecat),
  high = sum(scorecat %in% c("High")),
  high.prop = high / n, # Referring to vars I just created
  mod = sum(scorecat %in% c("Moderate")),
  mod.prop = mod / n, # Referring to vars I just created
  low = sum(scorecat %in% c("Low")),
  low.prop = low / n # Referring to vars I just created
)
drops <- c("high","mod", "low") #set up a list to drop columns
b1 <- b1[,!(names(b1) %in% drops)]
colnames(b1)[1] <- "Demographic Variable"

Note: this table produces the correct district values that should be assigned to each school uniquely. I'd like a table like the first example with these values for each school with the corresponding district.

b1 <- ddply(school.data, .var = c("gender", "District"), .fun = summarise,
  n = length(scorecat),
  high = sum(scorecat %in% c("High")),
  high.prop = high / n, # Referring to vars I just created
  mod = sum(scorecat %in% c("Moderate")),
  mod.prop = mod / n, # Referring to vars I just created
  low = sum(scorecat %in% c("Low")),
  low.prop = low / n # Referring to vars I just created
)
drops <- c("high","mod", "low") #set up a list to drop columns
b1 <- b1[,!(names(b1) %in% drops)]
colnames(b1)[1] <- "Demographic Variable"

cmbarbu · Accepted Answer

If I understand well, what you want is to compute a variable at the level of the district and then attribute it to the school level. I hardly understand the rest of your post.

You do that in base R using successively aggregate and merge .

Given that you already computed the summary b1 table using dplyr, you can just merge it to the initial school.data dataset.

    school.data2 <- merge(school.data,b1,by=c("District","gender"))

Let me know if that cuts it.

Descriptive Statistic for Multilevel (clustered) Data

Answers (1)

Related Questions