Reputation: 572
I want to convert only the numeric rows in the dataframe below into rowwise proportions.
df <- data.frame(
"id" = c("A", "B", "C", "D"),
"x" = c(1, 2, 3, 4),
"y" = c(2, 4, 6, 8)
)
So df$x[1] <- should be converted to .3333 and df$y[1] should be.6666 and so on. I want to do this with tidy code dynamically without referring to any columns by name, and ignoring any non-numeric columns in the dataframe.
My current attempt, based on reading a number of similar posts, is the following
df %>%
mutate_if(is.numeric, . / rowSums(across(where(is.numeric))))
This returns the following error: Error: across() must only be used inside dplyr verbs.
Please help!
Upvotes: 5
Views: 1367
Reputation: 1298
Rephrase to the following:
df %>%
mutate_if(is.numeric, ~ . / rowSums(select(df, where(is.numeric))))
Output:
id x y
1 A 0.3333333 0.6666667
2 B 0.3333333 0.6666667
3 C 0.3333333 0.6666667
4 D 0.3333333 0.6666667
Edit: If you want an answer that doesn't use any additional packages besides dplyr and base, and that can be piped more easily, here's one other (hacky) solution:
df %>%
group_by(id) %>%
mutate(sum = as.character(rowSums(select(cur_data(), is.numeric)))) %>%
summarise_if(is.numeric, ~ . / as.numeric(sum))
The usual dplyr ways of referring to the current data within a function (e.g. cur_data
) don't seem to play nicely with rowSums
in my original phrasing, so I took a slightly different approach here. There is likely a better way of doing this though, so I'm open to suggestions.
Upvotes: 5
Reputation: 410
Consider the adorn_percentages()
from the janitor
package.
janitor::adorn_percentages(df)
id x y
A 0.3333333 0.6666667
B 0.3333333 0.6666667
C 0.3333333 0.6666667
D 0.3333333 0.6666667
Upvotes: 4
Reputation: 101335
A base R option using proportions
idx <- sapply(df, is.numeric)
df[idx] <- proportions(as.matrix(df[idx]), 1)
gives
> df
id x y
1 A 0.3333333 0.6666667
2 B 0.3333333 0.6666667
3 C 0.3333333 0.6666667
4 D 0.3333333 0.6666667
Upvotes: 1
Reputation: 887118
We may use reduce
library(dplyr)
library(purrr)
df %>%
mutate(across(where(is.numeric))/select(cur_data(),
where(is.numeric)) %>%
reduce(`+`))
id x y
1 A 0.3333333 0.6666667
2 B 0.3333333 0.6666667
3 C 0.3333333 0.6666667
4 D 0.3333333 0.6666667
Upvotes: 2
Reputation: 388982
You can calculate rowwise sum and store it in a variable and for each column divide it by the values in the column.
library(dplyr)
rs <- rowSums(df %>% select(where(is.numeric)), na.rm = TRUE)
df %>% mutate(across(where(is.numeric), ~./rs))
# id x y
#1 A 0.3333333 0.6666667
#2 B 0.3333333 0.6666667
#3 C 0.3333333 0.6666667
#4 D 0.3333333 0.6666667
Upvotes: 4
Reputation: 21908
I think you can use the following solution:
library(dplyr)
library(purrr)
df[1] %>%
bind_cols(
pmap_df(df[-1], ~ prop.table(c(...))))
id x y
1 A 0.3333333 0.6666667
2 B 0.3333333 0.6666667
3 C 0.3333333 0.6666667
4 D 0.3333333 0.6666667
Also this one albeit a bit verbose:
library(dplyr)
library(tidyr)
df %>%
rowwise() %>%
mutate(output = list(prop.table(c_across(where(is.numeric))))) %>%
unnest_wider(output) %>%
select(-c(x, y)) %>%
setNames(names(df))
# A tibble: 4 x 3
id x y
<chr> <dbl> <dbl>
1 A 0.333 0.667
2 B 0.333 0.667
3 C 0.333 0.667
4 D 0.333 0.667
Upvotes: 4