Reputation: 33
I would like to make a function that it would calculate the lag-1 difference between multiple columns in R.
For example, my data frame looks like that:
id Value Value2 Value3 Value4
A234 10 15 NA NA
B345 20 25 25 30
C500 20 25 15 NA
I would like the function to take the difference between the 5th and 4th column. Then, the 4th and 3rd column and then last the 3rd and 2nd column.
I am aware of two previous Q & A on taking difference between rows:
But I can't adapt the solution to deal with columns. Sorry if this is too simple. I am newbie in R.
df <- structure(list(id = c("A234", "B345", "C500"), Value = c(10L,
20L, 20L), Value2 = c(15L, 25L, 25L), Value3 = c(NA, 25L, 15L
), Value4 = c(NA, 30L, NA)), .Names = c("id", "Value", "Value2",
"Value3", "Value4"), class = "data.frame", row.names = c(NA, -3L))
Upvotes: 3
Views: 6871
Reputation: 73265
I would like the function to take the difference between the 5th and 4th column. Then, the 4th and 3rd column and then last the 3rd and 2nd column.
We can do
cbind(df[1], df[3:5] - df[2:4])
# id Value2 Value3 Value4
#1 A234 5 NA NA
#2 B345 5 0 5
#3 C500 5 -10 NA
df[3:5] - df[2:4]
works because element-wise arithmetic is well-defined in R between two data frames of the same size. In particular, column names of DF1 - DF2
would inherits column names of the first data frame DF1
.
We can also use negative indexing:
df0 <- df[-1] ## drop "id" column
cbind(df[1], df0[-1] - df0[-length(df0)])
# id Value2 Value3 Value4
#1 A234 5 NA NA
#2 B345 5 0 5
#3 C500 5 -10 NA
caveat:
Since a data frame may store data of different types in different columns, I advise that you first check its columns before trying to take difference, otherwise arithmetic operation may be invalid. With your example data frame, we can do
sapply(df, class)
# id Value Value2 Value3 Value4
#"character" "integer" "integer" "integer" "integer"
So taking difference between the last 4 columns is valid.
Here is another example with iris
dataset:
sapply(iris, class)
#Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# "numeric" "numeric" "numeric" "numeric" "factor"
The last column is a "factor" that can not be used for valid arithmetic.
Note that we use class
rather than mode
for type checking on each data frame column, as it does a more comprehensive check. See this Q & A for more explanation.
A matrix can only hold a single type of data. Use mode
to check data type to ensure that arithmetic is valid. For example, you can't do arithmetic on "character" data.
Suppose we have a "numeric" matrix
set.seed(0)
A <- round(matrix(runif(25), 5, 5), 2)
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0.90 0.20 0.06 0.77 0.78
#[2,] 0.27 0.90 0.21 0.50 0.93
#[3,] 0.37 0.94 0.18 0.72 0.21
#[4,] 0.57 0.66 0.69 0.99 0.65
#[5,] 0.91 0.63 0.38 0.38 0.13
mode(A)
#[1] "numeric"
We can use the following to take difference between column 2 and column 1, column 3 and column 2, etc:
A[, -1, drop = FALSE] - A[, -ncol(A), drop = FALSE]
# [,1] [,2] [,3] [,4]
#[1,] -0.70 -0.14 0.71 0.01
#[2,] 0.63 -0.69 0.29 0.43
#[3,] 0.57 -0.76 0.54 -0.51
#[4,] 0.09 0.03 0.30 -0.34
#[5,] -0.28 -0.25 0.00 -0.25
Upvotes: 4
Reputation: 477
I think that for your purposes Zheyuan Li's answer is the simplest and most elegant. However, I just wanted to show how you could also solve this using my goto function for differences between values, diff
, from package Matrix
. In my answer I have retained the order you requested (col 5 - col4, col 4 - col 3, etc.).
# Load package
library(Matrix)
# Create a dataframe of differences
cbind(df[1], rev(as.data.frame(t(apply(df[-1], 1, diff, 1)))))
# id Value4 Value3 Value2
#1 A234 NA NA 5
#2 B345 5 0 5
#3 C500 NA -10 5
Upvotes: 0
Reputation: 1932
Didn't test this but, assuming your data.frame is called df it could be as simple as
cbind(df$Value2-df$Value,df$Value3-df$Value2,df$Value4-df$Value3)
You could rap than in a function very easily.
Upvotes: 0