uthpala
uthpala

Reputation: 5

How to get the count of occurence by a given range of a values in a perticular column in a data frame in R?

My dataframe is as follows:

ds <- c("ab", "ab", "cd", "ab", NA, "gh")
member1age<-c(9, 21, 39, NA, 69, 90)
member1gender<-c(1,2,1,1,2,NA)
member2age<-c(15,30, 60, 21,12, 45)
member2gender<-c(1,2,2,1,2,1)
member3age<-c(17,2,90,NA,31,5)
member3gender<-c(1,1,2,NA, 1,2)
member4age<-c(NA, NA,23, NA,NA,NA)
member4gender<-c(NA,NA,1,NA,NA,NA)

df <- data.frame(ds,member1age,member1gender,member2age,member2gender,member3age,member3gender,member4age,member4gender)

I want to find out how many individuals,are there in each gender for the given age ranges in a perticular ds.

For example how many males between age 15-30 for ds- "ab". What i want as the output is something like the following; Table A:

Age Category    Number of Males %   Number of Females   %   Total Number %
Below 5                 1    20.00             0     -          1      14.29 
6-14                    1    20.00             0     -          1      14.29 
15-30                   3    60.00             2     100.00     5      71.43 
31-45                   0    -                 0     -          0      -   
46-60                   0    -                 0     -          0      -   
> 60                    0    -                 0     -          0      -   
Total                   5    100.00            2     100.00     7      100.00 

what i have tried so far is

for each of memberage column applied the following;

forage1<-data.frame(ddply(df, .(ds), summarize,
                      All=length(ds),
                      of6Age=sum(Age1<6),
                      of6.15Age=sum(Age1>=6 & Age1<15),
                      of15.31Age=sum(Age1>=15 & Age1<31),
                      of31.46Age=sum(Age1>=31 & Age1<46),
                      of46.60Age=sum(Age1>=46 & Age1<=60),
                      of60Age=sum(Age1>60)))

And then merged the data frames together as follows; mylist<-list(forage1, forage2, forage3,forage4, forage5, forage6, forage7, forage8) f1<-ldply(mylist, rbind)

However this is not what i supposed to get.What i want is the Table A mentioned above. Please enlighten me with your kind support... Thank you in advance...!!!

Upvotes: 0

Views: 133

Answers (2)

Colonel Beauvel
Colonel Beauvel

Reputation: 31171

You should first build your data in a proper way:

df <- data.frame(ds = rep(ds,4),
                 age = c(member1age,member2age,member3age, member4age),   
                 gender = c(member1gender,member2gender,member3gender,member4gender))

Then use cut to classify your population age:

df$AgeCategory = cut(df$age, 
                     breaks=c(-1,5,14,30,45,60,Inf), 
                     labels=c('Below 5','6-14','15-30','31-45','46-60','>60'),
                     right=T)

Finally summarise - here I use data.table package:

library(data.table)

Then the data frame df has to be converted to a data table

df = setDT(df)

After that

dt = setkey(dt, AgeCategory, ds)
dt = dt[!is.na(ds) & !is.na(AgeCategory)]      #not interested by NA in these columns
dt = dt[CJ(unique(AgeCategory),unique(ds)), allow.cartesian=T][is.na(gender), gender:=0]

dt[ds=='ab', list(MaleNumber=sum(gender==1), FemaleNumber=sum(gender==2)),AgeCategory][
   ,c('MaleRatio', 'FemaleRatio'):=list(MaleNumber/sum(MaleNumber), 
                                        FemaleNumber/sum(FemaleNumber))][]

#   AgeCategory MaleNumber FemaleNumber MaleRatio FemaleRatio
#1:     Below 5          1            0       0.2           0
#2:        6-14          1            0       0.2           0
#3:       15-30          3            2       0.6           1
#4:       31-45          0            0       0.0           0
#5:       46-60          0            0       0.0           0
#6:         >60          0            0       0.0           0

Upvotes: 2

Thierry
Thierry

Reputation: 18487

First convert your df into a long format

library(tidyr)
library(dplyr)
long <- df %>% 
  add_rownames("id") %>% 
  gather(complex, value, -ds, -id) %>% 
  extract(
    complex, 
    c("member", "type"), 
    "([[:alpha:]]+[[:digit:]])([[:alpha:]]+)"
  ) %>%
  spread(type, value)

Then use cut() to split the age into groups

long %>%
  mutate(
    gender = factor(
      gender, 
      levels = c(1, 2), 
      labels = c("male", "female")
    ), 
    fAge = cut(age, c(0, 5, 14, 30, 45, 60, 100))
  ) %>%
  filter(!is.na(fAge), !is.na(gender)) %>%
  group_by(fAge, gender) %>% 
  summarise(count = n()) %>%
  spread(gender, count, fill = 0)

Upvotes: 0

Related Questions