Reputation: 20107
I have a dataframe that looks like this (truncated from real data):
host month score se
1 V43 0 8.000000 0.4472136
2 V43 1 6.000000 0.0000000
3 V43 3 6.000000 0.0000000
4 V51 0 6.000000 0.0000000
5 V51 1 7.333333 0.4216370
6 V51 3 7.333333 0.2108185
7 V51 6 6.000000 0.0000000
I want to subtract the month 0 score for each host from score for each month for that host. Each host's month 0 score needs to be applied separately, so that it'd look like this:
host month score se
1 V43 0 0.000000 0.4472136
2 V43 1 -2.000000 0.0000000
3 V43 3 -2.000000 0.0000000
4 V51 0 0.000000 0.0000000
5 V51 1 1.333333 0.4216370
6 V51 3 1.333333 0.2108185
7 V51 6 0.000000 0.0000000
In other words, I want to have each month show the difference from the starting point rather than absolute value.
Finding the month 0 rows is easy enough but I can't figure out how I can then match each row with the right host and do the subtraction. Is there a way to do this without using a for
loop?
Upvotes: 2
Views: 1792
Reputation: 1200
Here is one way to do it. This has a for loop, but it doesn't loop over each row in your dataframe, it just loops over each host.
x <- data.frame(host = c(43, 43, 43, 51, 51, 51, 51), month = c(0,1,2,0,2,4,5), val = c(12, 19, 32, 3, 5, 7, 9))
y <- split(x, x$host)
output <- NULL
for (h in y) {
start.i <- which(h$month ==0, arr.ind = TRUE)
h$val <- h$val - h$val[start.i]
output <- rbind(output, h)
}
Upvotes: 0
Reputation: 57686
Use plyr
, and order your data frame by host
and month
first.
ddply(df, .(host), transform, score=score-score[1])
Upvotes: 1