gregV
gregV

Reputation: 1097

dplyr to calculate fraction by group

there are only 2 farms, but tons of fruit. trying to see which farm has been performing better over 3 years where the performance is simply farmi / (farm1 + farm2), so for the fruit==peach farm1 performance was 20% vs. farm2 80%

sample data:

df <- data.frame(fruit = c("apple", "apple", "peach", "peach", "pear", "pear", "lime", "lime"),
                    farm = as.factor(c(1,2,1,2,1,2,1,2)), 'y2019' = c(0,0,3,12,0,7,4,6), 
                    'y2018' = c(5,3,0,0,8,2,0,0),'y2017' = c(4,5,7,15,0,0,0,0) )
> df
  fruit farm y2019 y2018 y2017
1 apple    1     0     5     4
2 apple    2     0     3     5
3 peach    1     3     0     7
4 peach    2    12     0    15
5  pear    1     0     8     0
6  pear    2     7     2     0
7  lime    1     4     0     0
8  lime    2     6     0     0
>

desired output:

 out
  fruit farm y2019 y2018    y2017
1 apple    1   0.0 0.625 0.444444
2 apple    2   0.0 0.375 0.555556
3 peach    1   0.2 0.000 0.318818
4 peach    2   0.8 0.000 0.681818
5  pear    1   0.0 0.800 0.000000
6  pear    2   1.0 0.200 0.000000
7  lime    1   0.4 0.000 0.000000
8  lime    2   0.6 0.000 0.000000
>

this is a far as i could go:

df %>% 
  group_by(fruit) %>% 
  summarise(across(where(is.numeric), sum))

Upvotes: 1

Views: 532

Answers (2)

akrun
akrun

Reputation: 886948

We can group by 'fruit', mutate across the columns that starts with 'y' to divide the elements by the sum of the values in those columns and if all values are 0, then return 0

library(dplyr)
df %>%
   group_by(fruit) %>% 
   mutate(across(starts_with('y'), ~ if(all(. == 0)) 0 else ./sum(.)))

# A tibble: 8 x 5
# Groups:   fruit [4]
#  fruit farm  y2019 y2018 y2017
#  <chr> <fct> <dbl> <dbl> <dbl>
#1 apple 1       0   0.625 0.444
#2 apple 2       0   0.375 0.556
#3 peach 1       0.2 0     0.318
#4 peach 2       0.8 0     0.682
#5 pear  1       0   0.8   0    
#6 pear  2       1   0.2   0    
#7 lime  1       0.4 0     0    
#8 lime  2       0.6 0     0    

NOTE: Here, we just used dplyr package and it is done in a single step

Or another option is adorn_percentages from janitor

library(janitor)
library(purrr)
df %>%
    group_split(fruit) %>%
    map_dfr(adorn_percentages, denominator = "col") %>%
    as_tibble

Or using data.table

library(data.table)
setDT(df)[, (3:5) := lapply(.SD, function(x) if(all(x == 0)) 0 
        else x/sum(x, na.rm = TRUE)), .SDcols = 3:5, by = fruit][]

Or using base R

grpSums <- rowsum(df[3:5], df$fruit)
df[3:5] <- df[3:5]/grpSums[match(df$fruit, row.names(grpSums)),]

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388817

We can use prop.table to calculate the proportions for each fruit.

library(dplyr)

df %>% 
  group_by(fruit) %>% 
  mutate(across(where(is.numeric), prop.table),
         #to replace `NaN` with 0
         across(where(is.numeric), tidyr::replace_na, 0))

#  fruit farm  y2019 y2018 y2017
#  <chr> <fct> <dbl> <dbl> <dbl>
#1 apple 1       0   0.625 0.444
#2 apple 2       0   0.375 0.556
#3 peach 1       0.2 0     0.318
#4 peach 2       0.8 0     0.682
#5 pear  1       0   0.8   0    
#6 pear  2       1   0.2   0    
#7 lime  1       0.4 0     0    
#8 lime  2       0.6 0     0    

Upvotes: 1

Related Questions