Reputation: 1097
there are only 2 farms, but tons of fruit
. trying to see which farm has been performing better over 3 years where the performance is simply farmi / (farm1 + farm2), so for the fruit==peach
farm1 performance was 20% vs. farm2 80%
sample data:
df <- data.frame(fruit = c("apple", "apple", "peach", "peach", "pear", "pear", "lime", "lime"),
farm = as.factor(c(1,2,1,2,1,2,1,2)), 'y2019' = c(0,0,3,12,0,7,4,6),
'y2018' = c(5,3,0,0,8,2,0,0),'y2017' = c(4,5,7,15,0,0,0,0) )
> df
fruit farm y2019 y2018 y2017
1 apple 1 0 5 4
2 apple 2 0 3 5
3 peach 1 3 0 7
4 peach 2 12 0 15
5 pear 1 0 8 0
6 pear 2 7 2 0
7 lime 1 4 0 0
8 lime 2 6 0 0
>
desired output:
out
fruit farm y2019 y2018 y2017
1 apple 1 0.0 0.625 0.444444
2 apple 2 0.0 0.375 0.555556
3 peach 1 0.2 0.000 0.318818
4 peach 2 0.8 0.000 0.681818
5 pear 1 0.0 0.800 0.000000
6 pear 2 1.0 0.200 0.000000
7 lime 1 0.4 0.000 0.000000
8 lime 2 0.6 0.000 0.000000
>
this is a far as i could go:
df %>%
group_by(fruit) %>%
summarise(across(where(is.numeric), sum))
Upvotes: 1
Views: 532
Reputation: 886948
We can group by 'fruit', mutate
across
the columns that starts with 'y' to divide the elements by the sum
of the values in those columns and if
all
values are 0, then return 0
library(dplyr)
df %>%
group_by(fruit) %>%
mutate(across(starts_with('y'), ~ if(all(. == 0)) 0 else ./sum(.)))
# A tibble: 8 x 5
# Groups: fruit [4]
# fruit farm y2019 y2018 y2017
# <chr> <fct> <dbl> <dbl> <dbl>
#1 apple 1 0 0.625 0.444
#2 apple 2 0 0.375 0.556
#3 peach 1 0.2 0 0.318
#4 peach 2 0.8 0 0.682
#5 pear 1 0 0.8 0
#6 pear 2 1 0.2 0
#7 lime 1 0.4 0 0
#8 lime 2 0.6 0 0
NOTE: Here, we just used dplyr
package and it is done in a single step
Or another option is adorn_percentages
from janitor
library(janitor)
library(purrr)
df %>%
group_split(fruit) %>%
map_dfr(adorn_percentages, denominator = "col") %>%
as_tibble
Or using data.table
library(data.table)
setDT(df)[, (3:5) := lapply(.SD, function(x) if(all(x == 0)) 0
else x/sum(x, na.rm = TRUE)), .SDcols = 3:5, by = fruit][]
Or using base R
grpSums <- rowsum(df[3:5], df$fruit)
df[3:5] <- df[3:5]/grpSums[match(df$fruit, row.names(grpSums)),]
Upvotes: 1
Reputation: 388817
We can use prop.table
to calculate the proportions for each fruit
.
library(dplyr)
df %>%
group_by(fruit) %>%
mutate(across(where(is.numeric), prop.table),
#to replace `NaN` with 0
across(where(is.numeric), tidyr::replace_na, 0))
# fruit farm y2019 y2018 y2017
# <chr> <fct> <dbl> <dbl> <dbl>
#1 apple 1 0 0.625 0.444
#2 apple 2 0 0.375 0.556
#3 peach 1 0.2 0 0.318
#4 peach 2 0.8 0 0.682
#5 pear 1 0 0.8 0
#6 pear 2 1 0.2 0
#7 lime 1 0.4 0 0
#8 lime 2 0.6 0 0
Upvotes: 1