Reputation: 747
To manipulate/summarize data over time, I usually use SQL ROW_NUMBER() OVER(PARTITION by ...). I'm new to R, so I'm trying to recreate tables I otherwise would create in SQL. The package sqldf does not allow OVER clauses. Example table:
ID Day Person Cost
1 1 A 50
2 1 B 25
3 2 A 30
4 3 B 75
5 4 A 35
6 4 B 100
7 6 B 65
8 7 A 20
I want my final table to include the average of the previous 2 instances for each day after their 2nd instance (day 4 for both):
ID Day Person Cost Prev2
5 4 A 35 40
6 4 B 100 50
7 6 B 65 90
8 7 A 20 35
I've been trying to play around with aggregate
, but I'm not really sure how to partition or qualify the function. Ideally, I'd prefer not to use the fact that id
is sequential with the date to form my answer (i.e. original table could be rearranged with random date order and code would still work). Let me know if you need more details, thanks for your help!
Upvotes: 1
Views: 1733
Reputation: 43344
You could lag zoo::rollapplyr
with a width of 2. In dplyr,
library(dplyr)
df %>% arrange(Day) %>% # sort
group_by(Person) %>% # set grouping
mutate(Prev2 = lag(zoo::rollapplyr(Cost, width = 2, FUN = mean, fill = NA)))
#> Source: local data frame [8 x 5]
#> Groups: Person [2]
#>
#> ID Day Person Cost Prev2
#> <int> <int> <fctr> <int> <dbl>
#> 1 1 1 A 50 NA
#> 2 2 1 B 25 NA
#> 3 3 2 A 30 NA
#> 4 4 3 B 75 NA
#> 5 5 4 A 35 40.0
#> 6 6 4 B 100 50.0
#> 7 7 6 B 65 87.5
#> 8 8 7 A 20 32.5
or all in dplyr,
df %>% arrange(Day) %>% group_by(Person) %>% mutate(Prev2 = (lag(Cost) + lag(Cost, 2)) / 2)
which returns the same thing. In base,
df <- df[order(df$Day), ]
df$Prev2 <- ave(df$Cost, df$Person, FUN = function(x){
c(NA, zoo::rollapplyr(x, width = 2, FUN = mean, fill = NA)[-length(x)])
})
df
#> ID Day Person Cost Prev2
#> 1 1 1 A 50 NA
#> 2 2 1 B 25 NA
#> 3 3 2 A 30 NA
#> 4 4 3 B 75 NA
#> 5 5 4 A 35 40.0
#> 6 6 4 B 100 50.0
#> 7 7 6 B 65 87.5
#> 8 8 7 A 20 32.5
or without zoo,
df$Prev2 <- ave(df$Cost, df$Person, FUN = function(x){
(c(NA, x[-length(x)]) + c(NA, NA, x[-(length(x) - 1):-length(x)])) / 2
})
which does the same thing. If you want to remove the NA
rows, tack on tidyr::drop_na(Prev2)
or na.omit
.
Upvotes: 2