Reputation: 1
I have a "big" data frame where I need to do a calcul like below :
data <- data.frame( "name"=c( "Tom", "Peter", "Peter", "Peter", "Tom", "Peter" ), "goal"=c(1,-2,2,3,-1,0), "total"=0 )
for( i in 1:nrow(data) ) {
count <- 0
for ( j in 1:i) {
if (data$name[j] == data$name[i]) {
count <- count + data$goal[j]
}
}
data$total[i] <- count
}
> data
name goal total
1 Tom 1 1
2 Peter -2 -2
3 John 2 2
4 Peter 3 1
5 Tom -1 0
6 Peter 0 1
I need to perform the calculation of the "total" column by adding the goals scored before.
My database is currently 83000 rows long and the calculation is very long. I would like to do this calculation without a "for" loop. Do you have an idea ?
I saw the following post but I don't know how to adapt it.
Thanks in advance
Upvotes: 0
Views: 47
Reputation: 431
If you want to avoid for
loops, try to find vectorized functions that do what you want. (Or functions working on dataframes or other multidimensional objects).
For your example you can separate the dataframe according to name
using group_by
from dplyr
and then use the vectorized function cumsum
(cumulative sum):
library(dplyr)
data <- data %>% group_by(name) %>% mutate(total = cumsum(goal))
Output
> data
# A tibble: 6 x 3
# Groups: name [2]
name goal total
<chr> <dbl> <dbl>
1 Tom 1 1
2 Peter -2 -2
3 Peter 2 0
4 Peter 3 3
5 Tom -1 0
6 Peter 0 3
I used your dataframe initialization in your post, which is why I get a different output than yours.
If you want to drop the grouping after your manipulation, use ungroup
.
Upvotes: 1