dplyr to calculate fraction by group

Question

there are only 2 farms, but tons of fruit. trying to see which farm has been performing better over 3 years where the performance is simply farm_i / (farm₁ + farm₂), so for the fruit==peach farm₁ performance was 20% vs. farm₂ 80%

sample data:

df <- data.frame(fruit = c("apple", "apple", "peach", "peach", "pear", "pear", "lime", "lime"),
                    farm = as.factor(c(1,2,1,2,1,2,1,2)), 'y2019' = c(0,0,3,12,0,7,4,6), 
                    'y2018' = c(5,3,0,0,8,2,0,0),'y2017' = c(4,5,7,15,0,0,0,0) )
> df
  fruit farm y2019 y2018 y2017
1 apple    1     0     5     4
2 apple    2     0     3     5
3 peach    1     3     0     7
4 peach    2    12     0    15
5  pear    1     0     8     0
6  pear    2     7     2     0
7  lime    1     4     0     0
8  lime    2     6     0     0
>

desired output:

 out
  fruit farm y2019 y2018    y2017
1 apple    1   0.0 0.625 0.444444
2 apple    2   0.0 0.375 0.555556
3 peach    1   0.2 0.000 0.318818
4 peach    2   0.8 0.000 0.681818
5  pear    1   0.0 0.800 0.000000
6  pear    2   1.0 0.200 0.000000
7  lime    1   0.4 0.000 0.000000
8  lime    2   0.6 0.000 0.000000
>

this is a far as i could go:

df %>% 
  group_by(fruit) %>% 
  summarise(across(where(is.numeric), sum))

akrun · Accepted Answer

We can group by 'fruit', mutate across the columns that starts with 'y' to divide the elements by the sum of the values in those columns and if all values are 0, then return 0

library(dplyr)
df %>%
   group_by(fruit) %>% 
   mutate(across(starts_with('y'), ~ if(all(. == 0)) 0 else ./sum(.)))

# A tibble: 8 x 5
# Groups:   fruit [4]
#  fruit farm  y2019 y2018 y2017
#      
#1 apple 1       0   0.625 0.444
#2 apple 2       0   0.375 0.556
#3 peach 1       0.2 0     0.318
#4 peach 2       0.8 0     0.682
#5 pear  1       0   0.8   0    
#6 pear  2       1   0.2   0    
#7 lime  1       0.4 0     0    
#8 lime  2       0.6 0     0

NOTE: Here, we just used dplyr package and it is done in a single step

Or another option is adorn_percentages from janitor

library(janitor)
library(purrr)
df %>%
    group_split(fruit) %>%
    map_dfr(adorn_percentages, denominator = "col") %>%
    as_tibble

Or using data.table

library(data.table)
setDT(df)[, (3:5) := lapply(.SD, function(x) if(all(x == 0)) 0 
        else x/sum(x, na.rm = TRUE)), .SDcols = 3:5, by = fruit][]

Or using base R

grpSums <- rowsum(df[3:5], df$fruit)
df[3:5] <- df[3:5]/grpSums[match(df$fruit, row.names(grpSums)),]

dplyr to calculate fraction by group

Answers (2)

Related Questions