how to divide all columns by sum of columns

Question

I have a data set where i need to apply some simple normalization. What i want to do is to calculate the colSums with colSums(DF) and than i use the colSums to divide all the values inside of one column. This is what i did and it seems to work but i cannot see if the correct colSum has been used per column. My dataframe looks like this:

structure(list(`2E` = c(28L, 9736L, 20L, 221L, 349L, 21L), `2I` = c(42L, 
8254L, 0L, 292L, 106L, 0L), `6E` = c(49L, 4303L, 0L, 1L, 258L, 
0L), `6I` = c(0L, 3409L, 0L, 70L, 92L, 0L), `15E` = c(0L, 4178L, 
0L, 121L, 106L, 12L), `15I` = c(0L, 3L, 0L, 0L, 0L, 0L), `16E` = c(25L, 
9715L, 4L, 167L, 533L, 30L), `16I` = c(0L, 5082L, 12L, 112L, 
35L, 0L), `18E` = c(0L, 7425L, 0L, 134L, 324L, 0L), `18I` = c(0L, 
15822L, 0L, 565L, 78L, 0L), `20E` = c(0L, 69881L, 0L, 2240L, 
3764L, 189L), `20I` = c(0L, 27718L, 0L, 837L, 312L, 239L), `21E` = c(0L, 
8841L, 5L, 241L, 458L, 12L), `21I` = c(0L, 308L, 0L, 9L, 14L, 
0L), `22E` = c(52L, 34347L, 0L, 523L, 1861L, 44L), `22I` = c(0L, 
4202L, 0L, 152L, 58L, 0L), `23E` = c(0L, 3742L, 0L, 30L, 185L, 
0L), `23I` = c(31L, 3766L, 0L, 108L, 38L, 12L), `25E` = c(0L, 
3647L, 0L, 26L, 189L, 0L), `25I` = c(0L, 11243L, 0L, 903L, 85L, 
168L), `26E` = c(0L, 8162L, 0L, 56L, 753L, 0L), `26I` = c(0L, 
6325L, 3L, 229L, 85L, 0L), `27E` = c(22L, 7548L, 0L, 119L, 213L, 
0L), `27I` = c(4L, 8949L, 0L, 1009L, 114L, 0L), `28E` = c(0L, 
6103L, 0L, 100L, 319L, 68L), `28I` = c(0L, 13306L, 0L, 582L, 
57L, 0L), `29E` = c(0L, 3608L, 9L, 54L, 142L, 27L), `29I` = c(0L, 
5035L, 0L, 138L, 84L, 0L), `30E` = c(0L, 27795L, 0L, 593L, 1680L, 
35L), `30I` = c(0L, 5506L, 0L, 146L, 75L, 0L), `32E` = c(13L, 
12516L, 22L, 230L, 745L, 17L), `32I` = c(0L, 1271L, 0L, 29L, 
13L, 0L), `33E` = c(0L, 3551L, 0L, 0L, 148L, 0L), `33I` = c(0L, 
15957L, 0L, 550L, 1L, 0L), `34E` = c(0L, 1852L, 0L, 18L, 138L, 
0L), `34I` = c(0L, 10469L, 0L, 243L, 119L, 0L), `35E` = c(0L, 
9570L, 0L, 362L, 671L, 0L), `35I` = c(19L, 4953L, 0L, 25L, 32L, 
23L), `36E` = c(0L, 2497L, 15L, 55L, 125L, 4L), `36I` = c(0L, 
1839L, 11L, 39L, 0L, 0L), `38E` = c(0L, 940L, 0L, 38L, 50L, 0L
), `38I` = c(0L, 2301L, 0L, 60L, 14L, 8L), `39E` = c(0L, 5324L, 
0L, 107L, 92L, 41L), `39I` = c(0L, 8360L, 0L, 262L, 13L, 0L), 
    `40E` = c(15L, 6107L, 10L, 183L, 173L, 13L), `40I` = c(8L, 
    1517L, 0L, 16L, 10L, 0L), `42E` = c(0L, 14681L, 35L, 312L, 
    282L, 54L), `42I` = c(0L, 7385L, 1L, 138L, 48L, 0L)), .Names = c("2E", 
"2I", "6E", "6I", "15E", "15I", "16E", "16I", "18E", "18I", "20E", 
"20I", "21E", "21I", "22E", "22I", "23E", "23I", "25E", "25I", 
"26E", "26I", "27E", "27I", "28E", "28I", "29E", "29I", "30E", 
"30I", "32E", "32I", "33E", "33I", "34E", "34I", "35E", "35I", 
"36E", "36I", "38E", "38I", "39E", "39I", "40E", "40I", "42E", 
"42I"), row.names = c("DQ459412", "DQ459413", "DQ459415", "DQ459418", 
"DQ459419", "DQ459420"), class = "data.frame")

So i have my dataframe, calculate the colSums. And then just simply did counts / colSums. Will this now use all values inside colSums or just the first one?

What is also important to know is that colSums should use the same colname as in the count dataframe to divide to. So the colSums of one column should be used to divide this column by.

Daniel Falbel · Accepted Answer

Look what R is doing when you make a data.frame/vector

> x  <-  data.frame(x = rep(1, 5), y = rep(1, 5))
> x/c(1,2)
x   y
1 1.0 0.5
2 0.5 1.0
3 1.0 0.5
4 0.5 1.0
5 1.0 0.5

Its the same when you make data.frame/colSums(data.frame)

how to divide all columns by sum of columns

Answers (2)

Related Questions