Stataq
Stataq

Reputation: 2297

how to built a new data with summarize and cut command for each age group

I want to build new data (age_summary) with a total number of people by age group. I would like to use "cut" and My codes are:

set.seed(12345)

#create a numeric variable Age       
AGE <- sample(0:110, 100, replace = TRUE)

# Creat Data fame
Sample.data <-data.frame(AGE)

age_summary <- Sample.data %>%  summarize(group_by(Sample.data,
                                                   cut(
                                                     AGE,
                                                     breaks=c(0,0.001, 0.083, 2, 13, 65,1000),
                                                     right=TRUE,
                                                     labels = c("Foetus(0 yr)","Neonate (0.001 - 0.082 yr)","Infant(0.083-1.999 yrs)","Child(2-12.999 yrs)", "Adolescent(13-17.999 yrs)","Adult(18-64.999 yrs.)","Elderly(65-199 yrs)")
  ),"Total people" = n())
)

However my codes do not work. I am not sure what went wrong. Any suggestion on how to solve this?

Add: I was able to get results that look like this:

enter image description here

is it possible for me to achieve something looks like this: enter image description here

Here is what I get with adorn_totals(.) on a new data set. the total people looks OK, but the ave-age looks strange.

enter image description here

Any idea?

Upvotes: 1

Views: 245

Answers (1)

akrun
akrun

Reputation: 887223

If we remove the summarise wrapping around the group_by, we can find the issue more easily. Here, the cut labels and breaks have different lengths, which can be changed if we add -Inf or Inf in breaks

library(dplyr)
Sample.data %>% 
      group_by(grp =  cut(AGE,
                              breaks=c(-Inf, 0,0.001, 0.083, 2, 13, 65,1000),
                              right=TRUE,
                              labels = c("Foetus(0 yr)",
     "Neonate (0.001 - 0.082 yr)","Infant(0.083-1.999 yrs)","Child(2-12.999 yrs)", "Adolescent(13-17.999 yrs)",
                   "Adult(18-64.999 yrs.)","Elderly(65-199 yrs)")
   )) %>% 
     summarise(TotalPeople = n())

If we need to create a row with different functions applied on different columns, use add_row

library(tibble)
library(tidyr)
Sample.data %>% 
    group_by(grp = cut( AGE, breaks=c(-Inf, 0,0.001, 0.083, 2, 13, 65,1000), 
        right=TRUE, labels = c("Foetus(0 yr)","Neonate (0.001 - 0.082 yr)","Infant(0.083-1.999 yrs)","Child(2-12.999 yrs)", 
          "Adolescent(13-17.999 yrs)","Adult(18-64.999 yrs.)","Elderly(65-199 yrs)") )) %>% 
    summarise(TotalPeople = n(), Ave_age=mean(AGE))%>%
    complete(grp = levels(grp), fill = list(TotalPeople = 0)) %>% 
    add_row(grp = "Total", TotalPeople = sum(.$TotalPeople),
                Ave_age = mean(.$Ave_age, na.rm = TRUE))

Upvotes: 1

Related Questions