Reputation: 108

Divide each column sum by the sum of the matrix

If I have a dataframe:

d = data.frame(sample=c("a2","a3"),a=c(1,5),b=c(4,5),c=c(6,4))
d
    sample a b c
1     a2 1 4 6
2     a3 5 5 4

How do I divide the sum of each column by the sum of the entire dataframe using dplyr so I end up with a dataframe that looks like:

     a b c
1    6/25 9/25 10/25

I tried to do

d <- d %>%
mutate_if(is.numeric, funs(colSums(d)/sum(d)))

but keeps returning erroring.

Thanks in advance!

Upvotes: 4

Answers (4)

G. Grothendieck

Reputation: 269644

Except for 2a and 2b, in each of these alternatives we could replace the first two components of the pipeline with d[-1] if it is ok to assume that we know that only the first column is non-numeric.

1) Base R With base R we get a straight forward solution:

d |> Filter(f = is.numeric) |> colSums() |> prop.table()
##    a    b    c 
## 0.24 0.36 0.40

2) dplyr With dplyr:

library(dplyr)

d %>%
  select(where(is.numeric)) %>%
  summarize(across(.fn = sum) / sum(.))
##      a    b   c
## 1 0.24 0.36 0.4

2a) or

d %>%
  summarize(across(where(is.numeric), sum)) %>%
  { . / sum(.) }

2b) The scoped functions such as the *_if functions are not used these days having been superseded by across but they are still available so if you want to use them anyways then try this which is close to the code in the question:

d %>%
  summarize_if(is.numeric, sum) %>%
  { . / sum(.) }

3) collapse With the collapse package, get the numeric variables (nv), sum each column (fsum) and then take proportions. When I benchmarked it on this data it ran 3x faster than (1), over 100x faster than (2) and 300x faster than (4).

library(collapse)
d |> nv() |> fsum() |> fsum(TRA = "/")
##    a    b    c 
## 0.24 0.36 0.40

4) dplyr/tidyr With tidyr and dplyr we can convert to long form, process and convert back.

library(dplyr)
library(tidyr)
d %>%
  select(where(is.numeric)) %>%
  pivot_longer(everything()) %>%
  group_by(name) %>%
  summarize(value = sum(value) / sum(.$value), .groups = "drop") %>%
  pivot_wider
## # A tibble: 1 x 3
##       a     b     c
##   <dbl> <dbl> <dbl>
## 1  0.24  0.36   0.4

Upvotes: 6

ThomasIsCoding

Reputation: 101403

Another base R option

> colSums(d[-1] / sum(d[-1]))
   a    b    c
0.24 0.36 0.40

Upvotes: 1

tmfmnk

Reputation: 39858

One dplyr possibility could be:

d %>%
    summarise(across(-1, sum)/sum(cur_data()[-1]))

     a    b   c
1 0.24 0.36 0.4

Or:

d %>%
    summarise(across(where(is.numeric), sum)/sum(across(where(is.numeric))))

Upvotes: 3

TarJae

Reputation: 78927

We could use colSums and the sum of colSums. -1 excludes column1 for calculation

result <- colSums(d[,-1])/sum(colSums(d[,-1]))
result

Output:

   a    b    c 
0.24 0.36 0.40

Upvotes: 3

Divide each column sum by the sum of the matrix

Answers (4)

Related Questions