manually adding new rows to a summary data frame

i am new to R , i am trying to get a summary statistics table with the values i have in this data frame + range, frequency and mode

this is what i have at the moment, i have tried various packages but i have yet to find one that gives me the measurements i need

children_allergy_local_df <- data.frame(children_allergy_local)

child_data <- children_allergy_local %>% select(childsID, gender, 
family_allergy, birth_order, birth_weight, breastfeeding, house_sqm, pets, 
smoke, IgE)
child_data_df <- data.frame(child_data)

summary(child_data)
as.data.frame(summary(child_data))
child_data_summary <- do.call(cbind, lapply(child_data, summary)) 

child_data_summary_df <- data.frame(child_data_summary)

child_data_summary_df <- child_data_summary_df[-c(2, 5), ]
child_data_summary_df

gives me

        col1  col2  col3  col 4 etc.....
min      val   val   val
median   val   val   val
mode     val   val   val
max      val   val   val

my aim is to be

          col1  col2  col3  col 4 etc.....
min        val   val   val
median     val   val   val
mode       val   val   val
max        val   val   val
range      val   val   val  
frequency  val   val   val
mode       val   val   val

is there a way to create the rows i want?, i cant seem to find anything online and i am absolutely stuck range() seems to give me 2 values and not the 1 value i need (max - min)

Upvotes: 0

Views: 39

Answers (2)

jay.sf
jay.sf

Reputation: 72758

You could create a matrix of the additional values separately and bind both together. This would be expandable at will.

Example:

library(car)
Duncan2 <- Duncan[-1]

a <- round(do.call(cbind, lapply(Duncan2, summary))[-c(2, 5), ], 2)

b <- do.call(cbind, lapply(Duncan2, function(x){
  mat <- matrix(NA, ncol = 3, 
                dimnames = list(NULL, c("Range", "Freq.", "Mode")))
  mat[,1] <- diff(range(x))
  mat[,2] <- frequency(x)
  mat[,3] <- mode(x)
  return(t(mat))
}))

c <- as.data.frame(rbind(a, b))
c
#         income education prestige
# Min.         7         7        3
# Median      42        45       41
# Mean     41.87     52.56    47.69
# Max.        81       100       97
# Range       74        93       94
# Freq.        1         1        1
# Mode   numeric   numeric  numeric

Hope it would help.

Edit: You could easily wrap it into a function then.

myCustomSum <- function(z){
  a <- round(do.call(cbind, lapply(z, summary))[-c(2, 5), ], 2)
  b <- do.call(cbind, lapply(z, function(x){
    mat <- matrix(NA, ncol = 3, 
                  dimnames = list(NULL, c("Range", "Freq.", "Mode")))
    mat[,1] <- diff(range(x))
    mat[,2] <- frequency(x)
    mat[,3] <- mode(x)
    return(t(mat))
    }))
  c <- as.data.frame(rbind(a, b))
  return(c)
  }

myCustomSum(Duncan2)
#         income education prestige
# Min.         7         7        3
# Median      42        45       41
# Mean     41.87     52.56    47.69
# Max.        81       100       97
# Range       74        93       94
# Freq.        1         1        1
# Mode   numeric   numeric  numeric

Upvotes: 1

GVianaF
GVianaF

Reputation: 59

There certainly is! I'll share mine. Instead of continuing your code, I'll start from (almost) the beginning, and assume that child_data_df is your data frame of interest. I had do get a little creative because of the range function. You'll need the dplyr package.

library(dplyr)
summary <- as.data.frame(
                         t(  # we have to transpose to look the way you want
                           do.call(data.frame,
                                   list(min = apply(child_data_df, 2, min),
                                   median = apply(child_data_df, 2, median),
                                   mode = apply(child_data_df, 2, mode),
                                   max = apply(child_data_df, 2, max),
                                   freq = apply(child_data_df, 2, length),
                                   mode = apply(child_data_df, 2, mode)) %>%
                                     mutate(range = max - min)))
names(summary) <- names(child_data_df)  # because we lost the var names

Upvotes: 1

Related Questions