Reputation: 3394

Table by groups using dplyr

Here is my replicating example.

HAVE <- data.frame(ID=c(1,2,3,4,5,6),
                   CLASS=c("A","A","B","B","C","C"),
                   AGE=c(14,13,11,12,14,14),
                   GENDER=c('MALE','MALE','FEMALE','MALE','FEMALE','FEMALE'))


WANT <- data.frame(COLUMN=c('AGE','GENDER = MALE'),
                   CLASSA=c(13.5,100),
                   CLASSB=c(11.5,50),
                   CLASSC=c(14,0))

Essentially the goal is to make a new data frame that shows the means for numeric variables and percentages for factor variables.

This is my coding attempt
HAVE %>%
  count(HAVE[,]) %>%
  group_by(CLASS) %>%
  mutate(mean)

Upvotes: 0

Answers (4)

IceCreamToucan

Reputation: 28705

Attempt at a base R solution

list.out <- 
  lapply(HAVE[-(1:2)], function(x){
        if(is.factor(x)) x <- x == levels(x)[2]
        aggregate(x, list(HAVE$CLASS), mean)$x})

out <- do.call(rbind, list.out)
colnames(out) <- unique(HAVE$CLASS)

out
#           A    B  C
# AGE    13.5 11.5 14
# GENDER  1.0  0.5  0

Upvotes: 1

Jeremy K.

Reputation: 1792

This should work.

HAVE %>% 
  group_by(CLASS) %>% 
  summarise(mean_age = mean(AGE), percent_male = mean(GENDER == "MALE")*100) %>% 
  t()

And you get:

             [,1]   [,2]   [,3]  
CLASS        "A"    "B"    "C"   
mean_age     "13.5" "11.5" "14.0"
percent_male "100"  " 50"  "  0"

Looks like penguin beat me to it, but I'll post mine, too, because the code is has some minor things that are maybe more streamlined--very minor.

And if you want all numeric variables to have the mean taken, and all factors to be percentage, I believe you can nest(), map(), and unnest(). Maybe someone can provide code for that.

Upvotes: 0

Nicolas2

Reputation: 2210

With the 'tables' package you can get:

library(tables)
tabular(AGE*mean+GENDER*Percent("col") ~ CLASS,HAVE)
#                       CLASS         
#                       A     B    C  
#        AGE    mean     13.5 11.5  14
# GENDER FEMALE Percent   0.0 50.0 100
#        MALE   Percent 100.0 50.0   0

And you can subset only MALEs:

tabular(AGE*mean+GENDER*Percent("col") ~ CLASS,HAVE) [-2,]

#               CLASS        
#               A     B    C 
#  AGE  mean     13.5 11.5 14
#  MALE Percent 100.0 50.0  0

Upvotes: 2

penguin

Reputation: 1376

Something similar

HAVE %>% 
    select(GENDER, AGE, CLASS) %>% 
    group_by(CLASS) %>% 
    summarise(AGE = mean(AGE), GENDER_MALE = sum(ifelse(GENDER == "MALE", 1, 0))*100/n()) %>% 
    t()

Output

            [,1]   [,2]   [,3]  
CLASS       "A"    "B"    "C"   
AGE         "13.5" "11.5" "14.0"
GENDER_MALE "100"  " 50"  "  0"

Upvotes: 1

Table by groups using dplyr

Answers (4)

Related Questions