Syed Ahmed
Syed Ahmed

Reputation: 209

Sort, calculate and then mutate in a dataframe

I'm a beginner in R and I'm facing an issue.

Problem: I need to sort a dataframe by 2 columns (ID, i'th column) and then take lagged difference of the i'th column and record it. Then resort the data with the ID and the i+1 column and so on and so forth.

What I have written up till now:

for (val in (4:length(colnames(df)))){
    df <- df[with(df, order(ID, df[val])), ]
    d2_df <- df %>% 
      mutate_at(c(df[val]), list(lagged = ~ . - lag(.)))
  }

The above code is messing somehow because the mutate_at function is throwing the error below:

Error: `.vars` must be a character/numeric vector or a `vars()` object, not a list.

Original dataset:

  ID S1 S2
1 1  3  1
2 1  5  2
3 1  1  3
4 2  2  7
5 3  4  9
6 3  2  11

After Sort on ID and S1

  ID S1 S2
1 1  1  3
2 1  3  1
3 1  5  2
4 2  2  7
5 3  2  11
6 3  4  9

Now what I need? S1.1 (which is the lagged difference of the sorted dataframe respective to each ID)

  ID S1 S2 S1.1
1 1  1  3  NA
2 1  3  1  2
3 1  5  2  2
4 2  2  7  NA
5 3  2  11 NA
6 3  4  9  2

Similar logic applies for S2 where a new S2.2 will be generated. Any help would be immensely appreciated.

Additionally what is required (below); where sum.S1 is the sum of the lagged differences and count.S1 is the count of observations at S1 for respective ID:

  ID sum.S1 sum.S2 count.S1 count.S2
1 1  4      2      3        3
2 2  NA     NA     1        1
3 3  2      2      2        2

Upvotes: 0

Views: 193

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389235

Here's a way using non-standard evaluation (NSE) :

library(dplyr)
library(purrr)
library(rlang)

cols <- c('S1', 'S2')
bind_cols(df, map_dfc(cols, ~{
     col <- sym(.x)
     df %>% 
       arrange(ID, !!col) %>%
       group_by(ID) %>%
       transmute(!!paste0(.x, '.1') := !!col - lag(!!col)) %>%
       ungroup %>%
       select(-ID)
 }))


#  ID S1 S2 S1.1 S2.1
#1  1  3  1   NA   NA
#2  1  5  2    2    1
#3  1  1  3    2    1
#4  2  2  7   NA   NA
#5  3  4  9   NA   NA
#6  3  2 11    2    2

data

df <- structure(list(ID = c(1L, 1L, 1L, 2L, 3L, 3L), S1 = c(3L, 5L, 
1L, 2L, 4L, 2L), S2 = c(1L, 2L, 3L, 7L, 9L, 11L)), 
class = "data.frame", row.names = c(NA, -6L))

Upvotes: 1

Related Questions