Reputation: 103

How to calculate the difference between consecutive sets of two columns in a column range using dplyr

I would like to calculate the difference between consecutive columns in a range of columns using dplyr.

For example, using the iris data set I would want to be able to specify the range Sepal.Width:Petal.Width and have a dataframe that contained the original iris data and the differences between the consecutive columns from Sepal.Width:Petal.Width:

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species diff1 diff2
1          5.1         3.5          1.4         0.2  setosa   2.1   1.2
2          4.9         3.0          1.4         0.2  setosa   1.6   1.2
3          4.7         3.2          1.3         0.2  setosa   1.9   1.1
4          4.6         3.1          1.5         0.2  setosa   1.6   1.3
5          5.0         3.6          1.4         0.2  setosa   2.2   1.2
6          5.4         3.9          1.7         0.4  setosa   2.2   1.3

Someone posted a solution loops and lapply (Calculate the difference between consecutive, grouped columns in a data.table) but I'm specifically looking for a dplyr solution.

Upvotes: 0

Answers (3)

jgjiggle

Reputation: 1

You could also use grepl and which to get the column indices..

start <- which(grepl("Sepal.Width", colnames(iris)))
end <- which(grepl("Petal.Width", colnames(iris)))

for (i in start:(end-1)) {
eval(parse(text = paste0("iris$diff",i-1," <- iris[,",i,"]-iris[,",i,"+1]")))
}

Upvotes: 0

akrun

Reputation: 886948

Here is an option with tidyverse. We select the range of columns, remove the first and last column in to a list of data.frames, then use reduce to get the difference between set of equal dimension datasets, and rename the columns

library(dplyr)
library(purrr)
library(stringr)
out <- iris %>%
      select(Sepal.Width:Petal.Width) %>% 
      {list(.[-length(.)], .[-1])} %>% 
      reduce(`-`) %>% 
      rename_all(~ str_c("diff", seq_along(.))) %>% 
      bind_cols(iris, .)

head(out)
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species diff1 diff2
#1          5.1         3.5          1.4         0.2  setosa   2.1   1.2
#2          4.9         3.0          1.4         0.2  setosa   1.6   1.2
#3          4.7         3.2          1.3         0.2  setosa   1.9   1.1
#4          4.6         3.1          1.5         0.2  setosa   1.6   1.3
#5          5.0         3.6          1.4         0.2  setosa   2.2   1.2
#6          5.4         3.9          1.7         0.4  setosa   2.2   1.3

Or another approach is to loop through index of columns select the columns, reduce it to a single column with - and bind with the original dataset

map_dfc(3:4, ~ iris %>%
                select(.x-1, .x) %>% 
                transmute(diff = reduce(., `-`))) %>% 
     bind_cols(iris, .)

Upvotes: 1

Jon Spring

Reputation: 66415

Here's a less advanced approach using dplyr and tidyr verbs. First I gather the columns for differencing into long format, then take their differences vs. previous column, strip out NA's for the first columns which have no previous column, rename the column, spread out, and attach onto the original.

library(tidyverse)
iris %>%
  bind_cols(iris %>%
    rowid_to_column() %>%
    gather(col, val, Sepal.Width:Petal.Width) %>%
    group_by(rowid) %>%
    mutate(val = abs(val - lag(val))) %>%
    filter(!is.na(val)) %>%
    mutate(col = paste0("diff_", col)) %>%
    spread(col, val)  %>%
    select(contains("diff"))
  )


    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species rowid diff_Petal.Length diff_Petal.Width
1            5.1         3.5          1.4         0.2     setosa     1               2.1              1.2
2            4.9         3.0          1.4         0.2     setosa     2               1.6              1.2
3            4.7         3.2          1.3         0.2     setosa     3               1.9              1.1
4            4.6         3.1          1.5         0.2     setosa     4               1.6              1.3
5            5.0         3.6          1.4         0.2     setosa     5               2.2              1.2
6            5.4         3.9          1.7         0.4     setosa     6               2.2              1.3
7            4.6         3.4          1.4         0.3     setosa     7               2.0              1.1

Upvotes: 2

How to calculate the difference between consecutive sets of two columns in a column range using dplyr

Answers (3)

Related Questions