Elisavet Za
Elisavet Za

Reputation: 33

lag-1 difference between columns of a data frame or matrix in R

I would like to make a function that it would calculate the lag-1 difference between multiple columns in R.

For example, my data frame looks like that:

  id  Value  Value2 Value3 Value4  
A234     10     15     NA     NA 
B345     20     25     25     30 
C500     20     25     15     NA

I would like the function to take the difference between the 5th and 4th column. Then, the 4th and 3rd column and then last the 3rd and 2nd column.

I am aware of two previous Q & A on taking difference between rows:

But I can't adapt the solution to deal with columns. Sorry if this is too simple. I am newbie in R.


df <- structure(list(id = c("A234", "B345", "C500"), Value = c(10L, 
20L, 20L), Value2 = c(15L, 25L, 25L), Value3 = c(NA, 25L, 15L
), Value4 = c(NA, 30L, NA)), .Names = c("id", "Value", "Value2", 
"Value3", "Value4"), class = "data.frame", row.names = c(NA, -3L))

Upvotes: 3

Views: 6871

Answers (3)

Zheyuan Li
Zheyuan Li

Reputation: 73265

data frame

I would like the function to take the difference between the 5th and 4th column. Then, the 4th and 3rd column and then last the 3rd and 2nd column.

We can do

cbind(df[1], df[3:5] - df[2:4])
#    id Value2 Value3 Value4
#1 A234      5     NA     NA
#2 B345      5      0      5
#3 C500      5    -10     NA

df[3:5] - df[2:4] works because element-wise arithmetic is well-defined in R between two data frames of the same size. In particular, column names of DF1 - DF2 would inherits column names of the first data frame DF1.

We can also use negative indexing:

df0 <- df[-1]  ## drop "id" column
cbind(df[1], df0[-1] - df0[-length(df0)])
#    id Value2 Value3 Value4
#1 A234      5     NA     NA
#2 B345      5      0      5
#3 C500      5    -10     NA

caveat:

Since a data frame may store data of different types in different columns, I advise that you first check its columns before trying to take difference, otherwise arithmetic operation may be invalid. With your example data frame, we can do

sapply(df, class)
#         id       Value      Value2      Value3      Value4 
#"character"   "integer"   "integer"   "integer"   "integer" 

So taking difference between the last 4 columns is valid.

Here is another example with iris dataset:

sapply(iris, class)
#Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
#   "numeric"    "numeric"    "numeric"    "numeric"     "factor" 

The last column is a "factor" that can not be used for valid arithmetic.

Note that we use class rather than mode for type checking on each data frame column, as it does a more comprehensive check. See this Q & A for more explanation.


matrix

A matrix can only hold a single type of data. Use mode to check data type to ensure that arithmetic is valid. For example, you can't do arithmetic on "character" data.

Suppose we have a "numeric" matrix

set.seed(0)
A <- round(matrix(runif(25), 5, 5), 2)
#     [,1] [,2] [,3] [,4] [,5]
#[1,] 0.90 0.20 0.06 0.77 0.78
#[2,] 0.27 0.90 0.21 0.50 0.93
#[3,] 0.37 0.94 0.18 0.72 0.21
#[4,] 0.57 0.66 0.69 0.99 0.65
#[5,] 0.91 0.63 0.38 0.38 0.13

mode(A)
#[1] "numeric"

We can use the following to take difference between column 2 and column 1, column 3 and column 2, etc:

A[, -1, drop = FALSE] - A[, -ncol(A), drop = FALSE]
#      [,1]  [,2] [,3]  [,4]
#[1,] -0.70 -0.14 0.71  0.01
#[2,]  0.63 -0.69 0.29  0.43
#[3,]  0.57 -0.76 0.54 -0.51
#[4,]  0.09  0.03 0.30 -0.34
#[5,] -0.28 -0.25 0.00 -0.25

Upvotes: 4

Stelios Serghiou
Stelios Serghiou

Reputation: 477

I think that for your purposes Zheyuan Li's answer is the simplest and most elegant. However, I just wanted to show how you could also solve this using my goto function for differences between values, diff, from package Matrix. In my answer I have retained the order you requested (col 5 - col4, col 4 - col 3, etc.).

# Load package
library(Matrix)    

# Create a dataframe of differences
cbind(df[1], rev(as.data.frame(t(apply(df[-1], 1, diff, 1)))))
#    id Value4 Value3 Value2
#1 A234     NA     NA      5
#2 B345      5      0      5
#3 C500     NA    -10      5

Upvotes: 0

CCurtis
CCurtis

Reputation: 1932

Didn't test this but, assuming your data.frame is called df it could be as simple as cbind(df$Value2-df$Value,df$Value3-df$Value2,df$Value4-df$Value3) You could rap than in a function very easily.

Upvotes: 0

Related Questions