ADF
ADF

Reputation: 572

Tidy way to convert numeric columns from counts to proportions

I want to convert only the numeric rows in the dataframe below into rowwise proportions.

df <- data.frame(
  "id" = c("A", "B", "C", "D"),
  "x" = c(1, 2, 3, 4),
  "y" = c(2, 4, 6, 8)
)

So df$x[1] <- should be converted to .3333 and df$y[1] should be.6666 and so on. I want to do this with tidy code dynamically without referring to any columns by name, and ignoring any non-numeric columns in the dataframe.

My current attempt, based on reading a number of similar posts, is the following

df %>%
  mutate_if(is.numeric, . / rowSums(across(where(is.numeric))))

This returns the following error: Error: across() must only be used inside dplyr verbs.

Please help!

Upvotes: 5

Views: 1367

Answers (6)

Rory S
Rory S

Reputation: 1298

Rephrase to the following:

df %>%
  mutate_if(is.numeric, ~ . / rowSums(select(df, where(is.numeric))))

Output:

  id         x         y
1  A 0.3333333 0.6666667
2  B 0.3333333 0.6666667
3  C 0.3333333 0.6666667
4  D 0.3333333 0.6666667

Edit: If you want an answer that doesn't use any additional packages besides dplyr and base, and that can be piped more easily, here's one other (hacky) solution:

df %>%
  group_by(id) %>% 
  mutate(sum = as.character(rowSums(select(cur_data(), is.numeric)))) %>%
  summarise_if(is.numeric, ~ . / as.numeric(sum))

The usual dplyr ways of referring to the current data within a function (e.g. cur_data) don't seem to play nicely with rowSums in my original phrasing, so I took a slightly different approach here. There is likely a better way of doing this though, so I'm open to suggestions.

Upvotes: 5

gradcylinder
gradcylinder

Reputation: 410

Consider the adorn_percentages() from the janitor package.

janitor::adorn_percentages(df)

id         x         y
  A 0.3333333 0.6666667
  B 0.3333333 0.6666667
  C 0.3333333 0.6666667
  D 0.3333333 0.6666667

Upvotes: 4

ThomasIsCoding
ThomasIsCoding

Reputation: 101335

A base R option using proportions

idx <- sapply(df, is.numeric)
df[idx] <- proportions(as.matrix(df[idx]), 1)

gives

> df
  id         x         y
1  A 0.3333333 0.6666667
2  B 0.3333333 0.6666667
3  C 0.3333333 0.6666667
4  D 0.3333333 0.6666667

Upvotes: 1

akrun
akrun

Reputation: 887118

We may use reduce

library(dplyr)
library(purrr)
df %>%
    mutate(across(where(is.numeric))/select(cur_data(), 
         where(is.numeric)) %>% 
        reduce(`+`))
  id         x         y
1  A 0.3333333 0.6666667
2  B 0.3333333 0.6666667
3  C 0.3333333 0.6666667
4  D 0.3333333 0.6666667

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388982

You can calculate rowwise sum and store it in a variable and for each column divide it by the values in the column.

library(dplyr)

rs <- rowSums(df %>% select(where(is.numeric)), na.rm = TRUE)
  
df %>% mutate(across(where(is.numeric), ~./rs))

#  id         x         y
#1  A 0.3333333 0.6666667
#2  B 0.3333333 0.6666667
#3  C 0.3333333 0.6666667
#4  D 0.3333333 0.6666667

Upvotes: 4

Anoushiravan R
Anoushiravan R

Reputation: 21908

I think you can use the following solution:

library(dplyr)
library(purrr)

df[1] %>%
  bind_cols(
    pmap_df(df[-1], ~ prop.table(c(...))))

  id         x         y
1  A 0.3333333 0.6666667
2  B 0.3333333 0.6666667
3  C 0.3333333 0.6666667
4  D 0.3333333 0.6666667

Also this one albeit a bit verbose:

library(dplyr)
library(tidyr)

df %>%
  rowwise() %>%
  mutate(output = list(prop.table(c_across(where(is.numeric))))) %>%
  unnest_wider(output) %>%
  select(-c(x, y)) %>%
  setNames(names(df))

# A tibble: 4 x 3
  id        x     y
  <chr> <dbl> <dbl>
1 A     0.333 0.667
2 B     0.333 0.667
3 C     0.333 0.667
4 D     0.333 0.667

Upvotes: 4

Related Questions