Joost
Joost

Reputation: 163

Subtracting columns in a dataframe (or matrix)

I am trying to do less in Excel and more in R, but get stuck on a simple calculation. I have a dataframe with meter readings over a number of weeks. I need to calculate the consumption in each week, i.e. subtracting a column from the previous column. For instance, in the example below I need to subtract Reading1 from Reading2 and Reading2 from Reading3. My actual data set contains hundreds of readings, so I need to find an easy way to do this.

SerialNo = c(1,2,3,4,5)
Reading1 = c(100, 102, 119, 99, 200)
Reading2 = c(102, 105, 120, 115, 207)
Reading3 = c(107, 109, 129, 118, 209)
df <- data.frame(SerialNo, Reading1, Reading2, Reading3)
df
  SerialNo Reading1 Reading2 Reading3
1      1        100      102      107
2      2        102      105      109
3      3        119      120      129
4      4         99      115      118
5      5        200      207      209

Upvotes: 1

Views: 344

Answers (4)

Ronak Shah
Ronak Shah

Reputation: 388817

We can use apply row-wise to calculate difference between consecutive columns.

temp <- t(apply(df[-1], 1, diff))
df[paste0('ans', seq_len(ncol(temp)))] <- temp
df

#  SerialNo Reading1 Reading2 Reading3 ans1 ans2
#1        1      100      102      107    2    5
#2        2      102      105      109    3    4
#3        3      119      120      129    1    9
#4        4       99      115      118   16    3
#5        5      200      207      209    7    2

Upvotes: 1

Another option is to use a simple for to loop over the columns of your data frame. I think this solution can be easier to understand, specially if you are starting to use R.

#Create a data frame with same rows as your df and number of cols-1
resul<-as.data.frame(matrix(nrow=nrow(df),ncol=(ncol(df)-1)))
#Add the SerialNo column to the first column of results df
resul[,1]<-df[,1]
#Set the name of the first column to SerialNo (as the first colname of df)
colnames(resul)[1]<-colnames(df)[1]

#Loop over the Reading columns of df (from the second column to the last minus 1)
for(i in 2:(ncol(df)-1)){
    #Do the subtraction
    resul[,i] <- df[,i+1]-df[,i]
    #Set the colname for each iteration
    colnames(resul)[i]<-paste0(colnames(df)[i+1],"-",colnames(df)[i])
}

Upvotes: 0

A. Suliman
A. Suliman

Reputation: 13125

df[,paste0(names(df)[3:4], names(df)[2:3])] <- df[,names(df)[3:4]] - df[,names(df)[2:3]] 
df
  SerialNo Reading1 Reading2 Reading3 Reading2Reading1 Reading3Reading2
1        1      100      102      107                2                5
2        2      102      105      109                3                4
3        3      119      120      129                1                9
4        4       99      115      118               16                3
5        5      200      207      209                7                2

PS: I assume columns are ordered 1,2,3,...etc

Upvotes: 1

Paul
Paul

Reputation: 2959

Here's a tidyverse solution that returns a data frame with similar formatting. It converts the data to long format (pivot_longer), applies the lag function, does the subtraction and then widens back to the original format (pivot_wider).

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(Reading1:Reading3,
               names_to = "reading",
               names_prefix = "Reading",
               values_to = "value") %>%
  group_by(SerialNo) %>%
  mutate(offset = lag(value, 1),
         measure = value - offset) %>%
  select(SerialNo, reading, measure) %>%
  pivot_wider(names_from = reading,
              values_from = measure,
              names_prefix = "Reading")

>
# A tibble: 5 x 4
# Groups:   SerialNo [5]
  SerialNo Reading1 Reading2 Reading3
     <dbl>    <dbl>    <dbl>    <dbl>
1        1       NA        2        5
2        2       NA        3        4
3        3       NA        1        9
4        4       NA       16        3
5        5       NA        7        2

Upvotes: 1

Related Questions