Reputation: 5813
I am looking for a clean way to convert all or a specified subset of columns to percentages (ratios, i.e. 100% == 1). This is easy in "base R" but I would like to integrate this seamlessly in a dplyr pipe. In the ideal case I would also prefer to use named column indices instead of -1
.
library(dplyr)
library(tidyr)
library(ggplot2)
library(scales)
set.seed(123)
df <- data.frame(Species = c("A", "B", "C"),
Loc_1 = sample(1:10, 3),
Loc_2 = sample(1:10, 3),
Loc_3 = sample(1:10, 3)
)
## I am looking for a dplyr equivalent of this
## ideally avoiding numeric column indices like -1
df[,-1] <- t(t(df[,-1]) / colSums(df[-1]))
df %>%
pivot_longer(cols=-1, names_to="Location", values_to="Abundance") %>%
ggplot(aes(Location, Abundance, fill=Species)) + geom_bar(stat="identity") + xlab("") +
ylab("Percentage") + scale_y_continuous(labels = percent)
Upvotes: 1
Views: 1322
Reputation: 887551
We can use
library(dplyr)
df %>%
mutate(across(where(is.numeric), proportions))
-output
Species Loc_1 Loc_2 Loc_3
1 A 0.2000000 0.1818182 0.3333333
2 B 0.6666667 0.5454545 0.2666667
3 C 0.1333333 0.2727273 0.4000000
Upvotes: 2
Reputation: 389175
You can use across
-
library(dplyr)
df %>% mutate(across(where(is.numeric), prop.table))
#You can also use either of these based on your preference
#df %>% mutate(across(-1, prop.table))
#df %>% mutate(across(starts_with('Loc'), prop.table))
# Species Loc_1 Loc_2 Loc_3
#1 A 0.2000000 0.1818182 0.3333333
#2 B 0.6666667 0.5454545 0.2666667
#3 C 0.1333333 0.2727273 0.4000000
Upvotes: 4