Reputation: 209
I'm a beginner in R and I'm facing an issue while calculating row differences by group for multiple columns.
What I wrote up till now:
dt2 <- function(dataset){
s_list <- c("S17", "S1", "S3", "S5", "S7")
for (val in s_list){
dt2_df <- dataset %>%
group_by(order) %>%
mutate(noquote(val) - lag(noquote(val)))
dt2_df$s_list[val] <- NULL
}
return(dt2_df)
}
The s_list is basically the column names of my dataset (for which I need row differences). If I'm doing it separately for each column then I'm getting the right answer (but the code is awfully long - using the mutate function repeatedly).
What I need: A dataframe which only contains the row wise differences for each column mentioned in the s_list and the groupid (order)
What is wrong with the above code: I get the error below:
Error: Problem with `mutate()` input `..1`.
x non-numeric argument to binary operator
i Input `..1` is `noquote(val) - lag(noquote(val))`.
i The error occured in group 1: order = 4779830.
Additional information: The values of each column in s_list contains timestamp information and "NA"(for no observation).
Any help would be highly appreciated.
What does work:
delta2_df <- df %>%
group_by(order) %>%
mutate(S17 - lag(S17)) %>%
mutate(S1 - lag(S1)) %>%
mutate(S3 - lag(S3)) %>%
mutate(S5 - lag(S5)) %>%
mutate(S7 - lag(S7))
But this will get awfully long because I have 18 columns. (The s_list only shows the first 5).
Upvotes: 2
Views: 213
Reputation: 3923
You don't really need a function or anything fancy since you already are using dplyr
I'm assuming what your data looks like but this should do it. (I'm using the older mutate_at
rather than across
because I don't want to assume you have the latest version of dplyr
. FWIW the specific error message you generated with your function is because noquote
is really only helpful for printing, it doesn't actually modify the string "S17" to make it the variable name you need.
library(dplyr)
# Your expected outcome
# df %>%
# # group_by(order) %>%
# mutate(S17 - lag(S17)) %>%
# mutate(S1 - lag(S1)) %>%
# mutate(S3 - lag(S3)) %>%
# mutate(S5 - lag(S5)) %>%
# mutate(S7 - lag(S7))
delta2_df <-
df %>%
mutate_at(c("S17", "S1", "S3", "S5", "S7"), list(lagged = ~ . - lag(.)))
delta2_df %>%
select(ends_with("_lagged")) %>%
slice_head(n = 5)
#> S17_lagged S1_lagged S3_lagged S5_lagged S7_lagged
#> 1 NA secs NA secs NA secs NA secs NA secs
#> 2 3.459238 secs 35.5398948 secs 38.31058 secs 7.582288 secs 4.819725 secs
#> 3 1.562570 secs 9.0936852 secs 44.25048 secs 41.544859 secs 5.264784 secs
#> 4 30.447722 secs 0.2308176 secs 50.53320 secs 4.032063 secs 63.913645 secs
#> 5 165.459635 secs 2.2446895 secs 13.80334 secs 7.527049 secs 21.084198 secs
Based on this data...
set.seed(2020)
df <- data.frame(
S17 = sort(as.POSIXct(" 2020-07-08 07:00:00") + runif(n=100, min=0, max=3600)),
S1 = sort(as.POSIXct(" 2020-07-08 08:00:00") + runif(n=100, min=0, max=3600)),
S3 = sort(as.POSIXct(" 2020-07-08 09:00:00") + runif(n=100, min=0, max=3600)),
S5 = sort(as.POSIXct(" 2020-07-08 10:00:00") + runif(n=100, min=0, max=3600)),
S7 = sort(as.POSIXct(" 2020-07-08 11:00:00") + runif(n=100, min=0, max=3600))
)
Upvotes: 1