Reputation: 25
i am new to R , i am trying to get a summary statistics table with the values i have in this data frame + range, frequency and mode
this is what i have at the moment, i have tried various packages but i have yet to find one that gives me the measurements i need
children_allergy_local_df <- data.frame(children_allergy_local)
child_data <- children_allergy_local %>% select(childsID, gender,
family_allergy, birth_order, birth_weight, breastfeeding, house_sqm, pets,
smoke, IgE)
child_data_df <- data.frame(child_data)
summary(child_data)
as.data.frame(summary(child_data))
child_data_summary <- do.call(cbind, lapply(child_data, summary))
child_data_summary_df <- data.frame(child_data_summary)
child_data_summary_df <- child_data_summary_df[-c(2, 5), ]
child_data_summary_df
gives me
col1 col2 col3 col 4 etc.....
min val val val
median val val val
mode val val val
max val val val
my aim is to be
col1 col2 col3 col 4 etc.....
min val val val
median val val val
mode val val val
max val val val
range val val val
frequency val val val
mode val val val
is there a way to create the rows i want?, i cant seem to find anything online and i am absolutely stuck range() seems to give me 2 values and not the 1 value i need (max - min)
Upvotes: 0
Views: 39
Reputation: 72758
You could create a matrix of the additional values separately and bind both together. This would be expandable at will.
Example:
library(car)
Duncan2 <- Duncan[-1]
a <- round(do.call(cbind, lapply(Duncan2, summary))[-c(2, 5), ], 2)
b <- do.call(cbind, lapply(Duncan2, function(x){
mat <- matrix(NA, ncol = 3,
dimnames = list(NULL, c("Range", "Freq.", "Mode")))
mat[,1] <- diff(range(x))
mat[,2] <- frequency(x)
mat[,3] <- mode(x)
return(t(mat))
}))
c <- as.data.frame(rbind(a, b))
c
# income education prestige
# Min. 7 7 3
# Median 42 45 41
# Mean 41.87 52.56 47.69
# Max. 81 100 97
# Range 74 93 94
# Freq. 1 1 1
# Mode numeric numeric numeric
Hope it would help.
Edit: You could easily wrap it into a function then.
myCustomSum <- function(z){
a <- round(do.call(cbind, lapply(z, summary))[-c(2, 5), ], 2)
b <- do.call(cbind, lapply(z, function(x){
mat <- matrix(NA, ncol = 3,
dimnames = list(NULL, c("Range", "Freq.", "Mode")))
mat[,1] <- diff(range(x))
mat[,2] <- frequency(x)
mat[,3] <- mode(x)
return(t(mat))
}))
c <- as.data.frame(rbind(a, b))
return(c)
}
myCustomSum(Duncan2)
# income education prestige
# Min. 7 7 3
# Median 42 45 41
# Mean 41.87 52.56 47.69
# Max. 81 100 97
# Range 74 93 94
# Freq. 1 1 1
# Mode numeric numeric numeric
Upvotes: 1
Reputation: 59
There certainly is! I'll share mine. Instead of continuing your code, I'll start from (almost) the beginning, and assume that child_data_df is your data frame of interest. I had do get a little creative because of the range function. You'll need the dplyr package.
library(dplyr)
summary <- as.data.frame(
t( # we have to transpose to look the way you want
do.call(data.frame,
list(min = apply(child_data_df, 2, min),
median = apply(child_data_df, 2, median),
mode = apply(child_data_df, 2, mode),
max = apply(child_data_df, 2, max),
freq = apply(child_data_df, 2, length),
mode = apply(child_data_df, 2, mode)) %>%
mutate(range = max - min)))
names(summary) <- names(child_data_df) # because we lost the var names
Upvotes: 1