Reputation: 359
I am having trouble generating complex cross-sections of descriptive statistics for data that are multilevel in nature. I have tried to go at this from a couple of different angles, but to no avail. Below please find some code I used for a plyr
solution that failed. The issue is that Schools exist within a District. I need the summary statistics for the District level to match every school in that district. The plyr
solution obviously only generates descriptive statistics at the district level for each sub-sample of school vs. applying the aggregate district information to each school.
I've been trying to find a way around this for several days when I have a moment.
Would by, aggregate, data.table offer any better solutions?
#Generate Data
set.seed(500)
School <- rep(seq(1:20), 2)
District <- rep(c(rep("East", 10), rep("West", 10)), 2)
Score <- rnorm(40, 100, 15)
Student.ID <- sample(1:1000,8,replace=T)
items <- data.frame(replicate(10, sample(1:4, 40, replace=TRUE)))
gender <- rep( c("Male","Female"), 100*c(0.4,0.6) )
gender <- sample(gender, 40)
low.inc <- rep( c("Status.A", "Status.B", "Status.c"), 100*c(0.3,0.2,0.5) )
low.inc <- sample(low.inc, 40)
items <- data.frame(lapply(items, factor, ordered=TRUE,
levels=1:4))
labels=c("Strongly disagree","Disagree",
"Agree","Strongly Agree")
school.data <- data.frame(Student.ID, School, District, Score, items, gender, low.inc)
sd1 = sd(school.data$Score)
m1 = mean(school.data$Score)
sd.above = m1 + sd1
sd.below = m1 - sd1
school.data$scorecat[Score >= sd.above] <- "High"
school.data$scorecat[Score > sd.below & Score <= sd.above] <- "Moderate"
school.data$scorecat[Score <= sd.below] <- "Low"
#Attempt to generate table
library(plyr)
b1 <- ddply(school.data, .var = c("gender", "District", "School"), .fun = summarise,
n = length(scorecat),
high = sum(scorecat %in% c("High")),
high.prop = high / n, # Referring to vars I just created
mod = sum(scorecat %in% c("Moderate")),
mod.prop = mod / n, # Referring to vars I just created
low = sum(scorecat %in% c("Low")),
low.prop = low / n # Referring to vars I just created
)
drops <- c("high","mod", "low") #set up a list to drop columns
b1 <- b1[,!(names(b1) %in% drops)]
colnames(b1)[1] <- "Demographic Variable"
Note: this table produces the correct district values that should be assigned to each school uniquely. I'd like a table like the first example with these values for each school with the corresponding district.
b1 <- ddply(school.data, .var = c("gender", "District"), .fun = summarise,
n = length(scorecat),
high = sum(scorecat %in% c("High")),
high.prop = high / n, # Referring to vars I just created
mod = sum(scorecat %in% c("Moderate")),
mod.prop = mod / n, # Referring to vars I just created
low = sum(scorecat %in% c("Low")),
low.prop = low / n # Referring to vars I just created
)
drops <- c("high","mod", "low") #set up a list to drop columns
b1 <- b1[,!(names(b1) %in% drops)]
colnames(b1)[1] <- "Demographic Variable"
Upvotes: 1
Views: 764
Reputation: 4534
If I understand well, what you want is to compute a variable at the level of the district and then attribute it to the school level. I hardly understand the rest of your post.
You do that in base R using successively aggregate and merge .
Given that you already computed the summary b1 table using dplyr, you can just merge it to the initial school.data dataset.
school.data2 <- merge(school.data,b1,by=c("District","gender"))
Let me know if that cuts it.
Upvotes: 2