tpetzoldt
tpetzoldt

Reputation: 5813

transform columns to percentages in a dplyr pipe

I am looking for a clean way to convert all or a specified subset of columns to percentages (ratios, i.e. 100% == 1). This is easy in "base R" but I would like to integrate this seamlessly in a dplyr pipe. In the ideal case I would also prefer to use named column indices instead of -1.

library(dplyr)
library(tidyr)
library(ggplot2)
library(scales)

set.seed(123)
df <- data.frame(Species = c("A", "B", "C"),
                 Loc_1 = sample(1:10, 3),
                 Loc_2 = sample(1:10, 3),
                 Loc_3 = sample(1:10, 3)
)

## I am looking for a dplyr equivalent of this
## ideally avoiding numeric column indices like -1
df[,-1] <- t(t(df[,-1]) / colSums(df[-1]))

df %>%
  pivot_longer(cols=-1, names_to="Location", values_to="Abundance") %>%
  ggplot(aes(Location, Abundance, fill=Species)) + geom_bar(stat="identity") + xlab("") +
  ylab("Percentage") + scale_y_continuous(labels = percent)

Upvotes: 1

Views: 1322

Answers (2)

akrun
akrun

Reputation: 887551

We can use

library(dplyr)
df %>%
     mutate(across(where(is.numeric), proportions))

-output

 Species     Loc_1     Loc_2     Loc_3
1       A 0.2000000 0.1818182 0.3333333
2       B 0.6666667 0.5454545 0.2666667
3       C 0.1333333 0.2727273 0.4000000

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 389175

You can use across -

library(dplyr)
df %>% mutate(across(where(is.numeric), prop.table))

#You can also use either of these based on your preference 
#df %>% mutate(across(-1, prop.table))
#df %>% mutate(across(starts_with('Loc'), prop.table))

#  Species     Loc_1     Loc_2     Loc_3
#1       A 0.2000000 0.1818182 0.3333333
#2       B 0.6666667 0.5454545 0.2666667
#3       C 0.1333333 0.2727273 0.4000000

Upvotes: 4

Related Questions