Reputation: 103
I would like to calculate the difference between consecutive columns in a range of columns using dplyr.
For example, using the iris data set I would want to be able to specify the range Sepal.Width:Petal.Width and have a dataframe that contained the original iris data and the differences between the consecutive columns from Sepal.Width:Petal.Width:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species diff1 diff2
1 5.1 3.5 1.4 0.2 setosa 2.1 1.2
2 4.9 3.0 1.4 0.2 setosa 1.6 1.2
3 4.7 3.2 1.3 0.2 setosa 1.9 1.1
4 4.6 3.1 1.5 0.2 setosa 1.6 1.3
5 5.0 3.6 1.4 0.2 setosa 2.2 1.2
6 5.4 3.9 1.7 0.4 setosa 2.2 1.3
Someone posted a solution loops and lapply (Calculate the difference between consecutive, grouped columns in a data.table) but I'm specifically looking for a dplyr solution.
Upvotes: 0
Views: 693
Reputation: 1
You could also use grepl
and which
to get the column indices..
start <- which(grepl("Sepal.Width", colnames(iris)))
end <- which(grepl("Petal.Width", colnames(iris)))
for (i in start:(end-1)) {
eval(parse(text = paste0("iris$diff",i-1," <- iris[,",i,"]-iris[,",i,"+1]")))
}
Upvotes: 0
Reputation: 886948
Here is an option with tidyverse
. We select
the range of columns, remove the first and last column in to a list
of data.frames, then use reduce
to get the difference between set of equal dimension datasets, and rename the columns
library(dplyr)
library(purrr)
library(stringr)
out <- iris %>%
select(Sepal.Width:Petal.Width) %>%
{list(.[-length(.)], .[-1])} %>%
reduce(`-`) %>%
rename_all(~ str_c("diff", seq_along(.))) %>%
bind_cols(iris, .)
head(out)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species diff1 diff2
#1 5.1 3.5 1.4 0.2 setosa 2.1 1.2
#2 4.9 3.0 1.4 0.2 setosa 1.6 1.2
#3 4.7 3.2 1.3 0.2 setosa 1.9 1.1
#4 4.6 3.1 1.5 0.2 setosa 1.6 1.3
#5 5.0 3.6 1.4 0.2 setosa 2.2 1.2
#6 5.4 3.9 1.7 0.4 setosa 2.2 1.3
Or another approach is to loop through index of columns select
the columns, reduce
it to a single column with -
and bind with the original dataset
map_dfc(3:4, ~ iris %>%
select(.x-1, .x) %>%
transmute(diff = reduce(., `-`))) %>%
bind_cols(iris, .)
Upvotes: 1
Reputation: 66415
Here's a less advanced approach using dplyr
and tidyr
verbs. First I gather the columns for differencing into long format, then take their differences vs. previous column, strip out NA's for the first columns which have no previous column, rename the column, spread out, and attach onto the original.
library(tidyverse)
iris %>%
bind_cols(iris %>%
rowid_to_column() %>%
gather(col, val, Sepal.Width:Petal.Width) %>%
group_by(rowid) %>%
mutate(val = abs(val - lag(val))) %>%
filter(!is.na(val)) %>%
mutate(col = paste0("diff_", col)) %>%
spread(col, val) %>%
select(contains("diff"))
)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species rowid diff_Petal.Length diff_Petal.Width
1 5.1 3.5 1.4 0.2 setosa 1 2.1 1.2
2 4.9 3.0 1.4 0.2 setosa 2 1.6 1.2
3 4.7 3.2 1.3 0.2 setosa 3 1.9 1.1
4 4.6 3.1 1.5 0.2 setosa 4 1.6 1.3
5 5.0 3.6 1.4 0.2 setosa 5 2.2 1.2
6 5.4 3.9 1.7 0.4 setosa 6 2.2 1.3
7 4.6 3.4 1.4 0.3 setosa 7 2.0 1.1
Upvotes: 2