user 31466
user 31466

Reputation: 699

Sum of a column for different values of another column

I have the following data frame:

d <- data.frame(x = c(6,6,6,17,17,17,68,68,68), w = c(4,2,1,0,5,1,0,2,7))

Now I want to add a new column sw such that

That is,

d
   x w sw
1  6 4 22
2  6 2 22
3  6 1 22
4 17 0 15
5 17 5 15
6 17 1 15
7 68 0  9
8 68 2  9
9 68 7  9

For this example, I can do the following:

j = sum(d$w)
k = sum(d$w[d$x!=6])
l = sum(d$w[d$x==68])

d$sw <- c(rep(j,3),rep(k,3),rep(l,3))

But for 100 different values of x, I can't do such coding, even I may not know what values is x taking because I will generate the x=round(rexp(1000,1/100)) .

Upvotes: 2

Views: 588

Answers (3)

lmo
lmo

Reputation: 38520

Here is a third base R solution that first calculates the values and then merges them to the original data.frame.

merge(df, within(aggregate(w ~ x, data=df, sum), sw <- rev(cumsum(rev(w))))[-2], by="x")
   x w sw
1  6 4 22
2  6 2 22
3  6 1 22
4 17 0 15
5 17 5 15
6 17 1 15
7 68 0  9
8 68 2  9
9 68 7  9

aggregate(w ~ x, data=df, sum) calculates the sum of w for each x group, Then sw <- rev(cumsum(rev(w))) creates the cumulative sum using two revs first to get the correct calculation and second to put it in the proper order. These functions are wrapped in within so a copy of this data.frame is returned, and [-2] drops the now extraneous w term calculated by aggregate. This aggregated data.frame is then merged onto the original data.frame.

data

df <- 
structure(list(x = c(6L, 6L, 6L, 17L, 17L, 17L, 68L, 68L, 68L
), w = c(4L, 2L, 1L, 0L, 5L, 1L, 0L, 2L, 7L)), .Names = c("x", 
"w"), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9"
), class = "data.frame")

Upvotes: 1

Sotos
Sotos

Reputation: 51622

Another idea via base R using Reduce with accumulate = TRUE argument, i.e.

ind <- aggregate(w ~ x, d, length)$w #get lengths of each group
rslts <- rev(unlist(lapply(Reduce(`rbind`, rev(split(d, d$x)), accumulate = TRUE), 
                                                                 function(i) sum(i$w))))
d$sw <- rep(rslts, ind)

d
#   x w sw
#1  6 4 22
#2  6 2 22
#3  6 1 22
#4 17 0 15
#5 17 5 15
#6 17 1 15
#7 68 0  9
#8 68 2  9
#9 68 7  9

Upvotes: 2

akrun
akrun

Reputation: 887991

We can do this with base R

v1 <- sapply(unique(d$x), function(x) rep(sum(d$w[match(x, d$x):nrow(d)]), length(x)))
d$sw <- rep(v1, table(d$x))

Upvotes: 2

Related Questions