Reputation: 109
I have a tibble structured as follows:
day theta
1 1 2.1
2 1 2.1
3 2 3.2
4 2 3.2
5 5 9.5
6 5 9.5
7 5 9.5
Note that the tibble contains multiple rows for each day
, and for each day
the same value for theta
is repeated an arbitrary number of times. (The tibble contains other arbitrary columns necessitating this repeating structure.)
I'd like to use dplyr
to cumulatively sum values for theta
across days such that, in the example above, 2.1
is added only a single time to 3.2
, etc. The tibble would be mutated so as to append the new cumulative sum (c.theta
) as follows:
day theta c.theta
1 1 2.1 2.1
2 1 2.1 2.1
3 2 3.2 5.3
4 2 3.2 5.3
5 5 9.5 14.8
6 5 9.5 14.8
7 5 9.5 14.8
...
My initial efforts to group_by
day
and then cumsum
over theta
resulted only in cumulative summing over the full set of data (e.g., 2.1 + 2.1 + 3.2 ...
) which is undesirable. In my Stack Overflow searches, I can find many examples of cumulative summing within groups, but never between groups, as I describe above. Nudges in the right direction would be much appreciated.
Upvotes: 0
Views: 123
Reputation: 38500
In base R you could use split<-
and tapply
to return the desired result.
# construct 0 vector to fill in
dat$temp <- 0
# fill in with cumulative sum for each day
split(dat$temp, dat$day) <- cumsum(tapply(dat$theta, dat$day, head, 1))
Here, tapply
returns the first element of theta for each day which is is fed to cumsum
. The elements of cumulative sum are fed to each day using split<-
.
This returns
dat
day theta temp
1 1 2.1 2.1
2 1 2.1 2.1
3 2 3.2 5.3
4 2 3.2 5.3
5 5 9.5 14.8
6 5 9.5 14.8
7 5 9.5 14.8
Upvotes: 1
Reputation: 28329
Not a dplyr
, but just an alternative data.table
solution:
library(data.table)
# Original table is called d
setDT(d)
merge(d, unique(d)[, .(c.theta = cumsum(theta), day)])
day theta c.theta
1: 1 2.1 2.1
2: 1 2.1 2.1
3: 2 3.2 5.3
4: 2 3.2 5.3
5: 5 9.5 14.8
6: 5 9.5 14.8
7: 5 9.5 14.8
PS: If you want to preserve other columns you have to use unique(d[, .(day, theta)])
Upvotes: 2
Reputation: 60060
Doing this in dplyr
I came up with a very similar solution to PoGibas - use distinct
to just get one row per day, find the sum and merge back in:
df = read.table(text="day theta
1 1 2.1
2 1 2.1
3 2 3.2
4 2 3.2
5 5 9.5
6 5 9.5
7 5 9.5", header = TRUE)
cumsums = df %>%
distinct(day, theta) %>%
mutate(ctheta = cumsum(theta))
df %>%
left_join(cumsums %>% select(day, ctheta), by = 'day')
Upvotes: 3