Reputation: 13
I am trying to create a table that summarizes data from a dataset. I have:
set.seed(123)
age <- runif(100, 1, 100)
gender <- sample(c("Male", "Female"), 100, replace=TRUE)
bmi <- rep(c("Normal"), 100)
height <- runif(100, 150, 190)
smoker <- sample(c("Yes", "No"), 100, replace=TRUE)
finaldata <- data.frame(age, gender, bmi, height, smoker)
str(finaldata)
continuous <- finaldata[ ,c(1, 4)]
categorical <- finaldata[ ,c(2, 3, 5)]
Table1 <- function(CONT, CAT, DIGITS=2){
table_cont <- matrix(0, ncol=2, nrow=ncol(CONT))
for (i in 1:ncol(CONT)){
table_cont[i, ] <- c(round(mean(CONT[ ,i]), DIGITS), round(sd(CONT[ ,i]), DIGITS))
}
cats <- function(VARIABLE){
table_cat <- matrix(0, ncol=2, nrow=dim(table(CAT[ ,VARIABLE])))
for (i in 1:dim(table(CAT[ ,VARIABLE]))){
table_cat[i, ] <- c(table(CAT[ ,VARIABLE])[i], paste(round(prop.table(table(CAT[ ,VARIABLE]))[i]*100, DIGITS), "%"))
}
rownames(table_cat) <- levels(CAT[, VARIABLE])
table_cat <- rbind(rep("", ncol=ncol(table_cat)), table_cat)
return(table_cat)
}
table_cat <- rbind(cats(1), cats(2), cats(3))
descriptives <- rbind(table_cont, table_cat)
return(descriptives)
}
Table1(continuous, categorical)
It works fine. That said, for binding the categorical variables, I am doing rbind(cats(1), cats(2), cats(3)). While that is ok for this dataset, I don't want to have to keep altering that for every other dataset I use. I tried binding them in a for-loop but was unsuccessful. How does one go about binding them without repetitively specifying rbind(cats(1), cats(2), cats(3))?
Upvotes: 1
Views: 1440
Reputation: 4024
You want to do this instead:
library(dplyr)
library(tidyr)
better_summary = function(data){
continuous = data %>% Filter(is.numeric, .)
categorical = data %>% Filter(. %>% is.numeric %>% `!`, .)
continuous_summary =
continuous %>%
gather(variable, value) %>%
group_by(variable) %>%
summarize(mean = mean(value),
sd = sd(value))
categorical_summary =
categorical %>%
gather(variable, value) %>%
count(variable, value) %>%
mutate(percent = n / sum(n))
list(continuous_summary = continuous_summary,
categorical_summary = categorical_summary)
}
Upvotes: 0
Reputation: 1361
try this:
table_cat <- data.frame()
# N here is the number of cat() function calls you plan on making
for(i in 1:N){
table_cat <- rbind(table_cat,cat(i))
}
if you do not want that rownames issue try this:
table_cat <- matrix(nrow=0,ncol=ncol(cats(1)))
for(i in 1:3){
table_cat <- rbind(table_cat,cats(i))
}
Upvotes: 2
Reputation: 4385
Unless your rows are dependent on each other, you should use functions like apply
or plyr
's ddply
to process the data without all of the for loops.
cont.func <- function(CONT.col, DIGITS=2){
c(round(mean(CONT.col), DIGITS), round(sd(CONT.col), DIGITS))
}
CONT = t(apply(continuous,2,cont.func))
cat.func <- function(CAT.col,DIGITS=2){
tab = table(CAT.col)
rbind(cbind(tab, paste0(round(prop.table(tab)*100, DIGITS), "%")),"")
}
CAT = do.call("rbind",apply(categorical,2,cat.func))
rbind(CONT,c("",""),CAT)
Also, you can use as.data.frame
around the rbind
call in cat.func
to preserve the categorical variable name when creating CAT. This may be preferable to using blank quotes depending on your needs.
Upvotes: 0