Reputation: 744
Let say I have a dataframe A
(2 rows, 4 columns):
a b c d
1 2 3 4
1 3 5 4
The first 2 columns are in the first group, the last 2 are in the second group. I want to divide this df by the row sums of each group. Basically I want something like this:
a b c d
1/3 2/3 3/7 4/7
1/4 3/4 5/9 4/9
This is just a toy example. In my problem I have a lot of groups, not just 2.
Upvotes: 0
Views: 650
Reputation: 269644
Let g
define the groupings such that every column having the same value in g
belongs to the same group. Here we defined g
to be successive pairs of columns in DF
but if the groups had various sizes we would replace that with whatever definition were appropriate.
For each row in DF
we split it by g
using ave
to apply prop.table
to each component of the split. For example, prop.table(1:2)
gives c(1/3, 2/3)
.
We assign the result to matrix mat
. The last line converts mat
to a data frame. We can omit this last line if a matrix is sufficient.
No packages are used.
g <- gl(ncol(DF)/2, 2) # g = c(1, 1, 2, 2)
mat <- t(apply(DF, 1, function(x) ave(x, g, FUN = prop.table)))
as.data.frame(mat)
giving:
> prop
a b c d
1 0.3333333 0.6666667 0.4285714 0.5714286
2 0.2500000 0.7500000 0.5555556 0.4444444
If the columns always occur in pairs then
We used this as input:
DF <- structure(list(a = c(1L, 1L), b = 2:3, c = c(3L, 5L), d = c(4L,
4L)), .Names = c("a", "b", "c", "d"), class = "data.frame", row.names = c(NA,
-2L))
Upvotes: 1
Reputation: 23574
My solution was the following. I wanted to crate pairs of columns by identifying even-number column positions (e.g., 2, 4, and 6). Then, I looped through each pair and handled the calculation in lapply()
. In the final step, I combined all results using as.data.frame()
. Note your data is called mydf
.
as.data.frame(lapply(seq(from = 2, to = ncol(mydf), by = 2), function(x) {
mydf[, (x-1):x] / rowSums(mydf[, (x-1):x])}
))
a b c d
1 0.3333333 0.6666667 0.4285714 0.5714286
2 0.2500000 0.7500000 0.5555556 0.4444444
Upvotes: 1
Reputation: 145775
Here's a simple way with a for
loop. I'll assume you have a list of column indices for each group:
groups = list(c(1, 2), c(3, 4))
result = dd
for (g in groups) {
result[g] = dd[g] / rowSums(dd[g])
}
result
# a b c d
# 1 0.3333333 0.6666667 0.4285714 0.5714286
# 2 0.2500000 0.7500000 0.5555556 0.4444444
You could also use lapply
like this:
result2 = do.call(cbind, lapply(groups, function(g) dd[g] / rowSums(dd[g])))
Using this input data:
dd = read.table(text = "a b c d
1 2 3 4
1 3 5 4", header = T)
Upvotes: 3