Amyly
Amyly

Reputation: 47

R summing two rows

I have a data frame that looks like this:

soccer <- data.frame(
      A=c("soccer1", "soccer1", "soccer2", "soccer2", "soccer3", "soccer3"),
      game=c(1, 2, 1, 2, 1, 2),
      number=c(1000, 1500, 200, 2100, 650, 1850)
)

I am trying to group by rows in column "A" with the same name (ie. soccer1), and then use the summarise function to show the ratio of # of fans at game 1/number of fans at game 1 and 2. I am stuck here:

soccer%>%
   group_by(A)%>%
   summarise(n=n[game==1]/n[game==2+game==1]*100)

I cannot figure out how to sum two numbers for the denominator.

Upvotes: 1

Views: 55

Answers (1)

jpsmith
jpsmith

Reputation: 17195

You could try either of the approaches in the mutate function, depending on if you want the raw (numeric) proportions or simply the percent (character):

library(dplyr)
soccer %>% 
  group_by(A) %>% 
  mutate(prop = number / sum(number), # for proportions (numeric)
         perc = paste0(sprintf("%2.f", number / sum(number) * 100), "%")) # for percentage (character)

Output:

#   A        game number   prop perc 
#   <chr>   <dbl>  <dbl>  <dbl> <chr>
# 1 soccer1     1   1000 0.4    "40%"
# 2 soccer1     2   1500 0.6    "60%"
# 3 soccer2     1    200 0.0870 " 9%"
# 4 soccer2     2   2100 0.913  "91%"
# 5 soccer3     1    650 0.26   "26%"
# 6 soccer3     2   1850 0.74   "74%"

In base R, you could do this:

soccer$props <- unlist(tapply(soccer$number, soccer$A, 
                               FUN = function(x) x / sum(x)))


#        A game number      props
#1 soccer1    1   1000 0.40000000
#2 soccer1    2   1500 0.60000000
#3 soccer2    1    200 0.08695652
#4 soccer2    2   2100 0.91304348
#5 soccer3    1    650 0.26000000
#6 soccer3    2   1850 0.74000000

Noting the comments, @Adam Quek proposes an elegant solution as well:

soccer %>% group_by(A) %>% mutate(prop = proportions(number))

Upvotes: 1

Related Questions