TDo
TDo

Reputation: 744

Divide the dataframe to the sum of each row

Let say I have a dataframe A (2 rows, 4 columns):

a   b   c   d
1   2   3   4
1   3   5   4

The first 2 columns are in the first group, the last 2 are in the second group. I want to divide this df by the row sums of each group. Basically I want something like this:

a     b     c     d
1/3   2/3   3/7   4/7
1/4   3/4   5/9   4/9

This is just a toy example. In my problem I have a lot of groups, not just 2.

Upvotes: 0

Views: 650

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269644

Let g define the groupings such that every column having the same value in g belongs to the same group. Here we defined g to be successive pairs of columns in DF but if the groups had various sizes we would replace that with whatever definition were appropriate.

For each row in DF we split it by g using ave to apply prop.table to each component of the split. For example, prop.table(1:2) gives c(1/3, 2/3). We assign the result to matrix mat. The last line converts mat to a data frame. We can omit this last line if a matrix is sufficient.

No packages are used.

g <- gl(ncol(DF)/2, 2)  # g = c(1, 1, 2, 2)

mat <- t(apply(DF, 1, function(x) ave(x, g, FUN = prop.table)))
as.data.frame(mat)

giving:

> prop
          a         b         c         d
1 0.3333333 0.6666667 0.4285714 0.5714286
2 0.2500000 0.7500000 0.5555556 0.4444444

If the columns always occur in pairs then

Note

We used this as input:

DF <- structure(list(a = c(1L, 1L), b = 2:3, c = c(3L, 5L), d = c(4L, 
4L)), .Names = c("a", "b", "c", "d"), class = "data.frame", row.names = c(NA, 
-2L))

Upvotes: 1

jazzurro
jazzurro

Reputation: 23574

My solution was the following. I wanted to crate pairs of columns by identifying even-number column positions (e.g., 2, 4, and 6). Then, I looped through each pair and handled the calculation in lapply(). In the final step, I combined all results using as.data.frame(). Note your data is called mydf.

as.data.frame(lapply(seq(from = 2, to = ncol(mydf), by = 2), function(x) {

            mydf[, (x-1):x] / rowSums(mydf[, (x-1):x])}

            ))

          a         b         c         d
1 0.3333333 0.6666667 0.4285714 0.5714286
2 0.2500000 0.7500000 0.5555556 0.4444444

Upvotes: 1

Gregor Thomas
Gregor Thomas

Reputation: 145775

Here's a simple way with a for loop. I'll assume you have a list of column indices for each group:

groups = list(c(1, 2), c(3, 4))

result = dd
for (g in groups) {
  result[g] = dd[g] / rowSums(dd[g])
}

result
#           a         b         c         d
# 1 0.3333333 0.6666667 0.4285714 0.5714286
# 2 0.2500000 0.7500000 0.5555556 0.4444444

You could also use lapply like this:

result2 = do.call(cbind, lapply(groups, function(g) dd[g] / rowSums(dd[g])))

Using this input data:

dd = read.table(text = "a   b   c   d
1   2   3   4
1   3   5   4", header = T)

Upvotes: 3

Related Questions