Reputation: 163
I am trying to do less in Excel and more in R, but get stuck on a simple calculation. I have a dataframe with meter readings over a number of weeks. I need to calculate the consumption in each week, i.e. subtracting a column from the previous column. For instance, in the example below I need to subtract Reading1
from Reading2
and Reading2
from Reading3
. My actual data set contains hundreds of readings, so I need to find an easy way to do this.
SerialNo = c(1,2,3,4,5)
Reading1 = c(100, 102, 119, 99, 200)
Reading2 = c(102, 105, 120, 115, 207)
Reading3 = c(107, 109, 129, 118, 209)
df <- data.frame(SerialNo, Reading1, Reading2, Reading3)
df
SerialNo Reading1 Reading2 Reading3
1 1 100 102 107
2 2 102 105 109
3 3 119 120 129
4 4 99 115 118
5 5 200 207 209
Upvotes: 1
Views: 344
Reputation: 388817
We can use apply
row-wise to calculate difference between consecutive columns.
temp <- t(apply(df[-1], 1, diff))
df[paste0('ans', seq_len(ncol(temp)))] <- temp
df
# SerialNo Reading1 Reading2 Reading3 ans1 ans2
#1 1 100 102 107 2 5
#2 2 102 105 109 3 4
#3 3 119 120 129 1 9
#4 4 99 115 118 16 3
#5 5 200 207 209 7 2
Upvotes: 1
Reputation: 5620
Another option is to use a simple for
to loop over the columns of your data frame. I think this solution can be easier to understand, specially if you are starting to use R.
#Create a data frame with same rows as your df and number of cols-1
resul<-as.data.frame(matrix(nrow=nrow(df),ncol=(ncol(df)-1)))
#Add the SerialNo column to the first column of results df
resul[,1]<-df[,1]
#Set the name of the first column to SerialNo (as the first colname of df)
colnames(resul)[1]<-colnames(df)[1]
#Loop over the Reading columns of df (from the second column to the last minus 1)
for(i in 2:(ncol(df)-1)){
#Do the subtraction
resul[,i] <- df[,i+1]-df[,i]
#Set the colname for each iteration
colnames(resul)[i]<-paste0(colnames(df)[i+1],"-",colnames(df)[i])
}
Upvotes: 0
Reputation: 13125
df[,paste0(names(df)[3:4], names(df)[2:3])] <- df[,names(df)[3:4]] - df[,names(df)[2:3]]
df
SerialNo Reading1 Reading2 Reading3 Reading2Reading1 Reading3Reading2
1 1 100 102 107 2 5
2 2 102 105 109 3 4
3 3 119 120 129 1 9
4 4 99 115 118 16 3
5 5 200 207 209 7 2
PS: I assume columns are ordered 1,2,3,...etc
Upvotes: 1
Reputation: 2959
Here's a tidyverse solution that returns a data frame with similar formatting. It converts the data to long format (pivot_longer
), applies the lag
function, does the subtraction and then widens back to the original format (pivot_wider
).
library(dplyr)
library(tidyr)
df %>%
pivot_longer(Reading1:Reading3,
names_to = "reading",
names_prefix = "Reading",
values_to = "value") %>%
group_by(SerialNo) %>%
mutate(offset = lag(value, 1),
measure = value - offset) %>%
select(SerialNo, reading, measure) %>%
pivot_wider(names_from = reading,
values_from = measure,
names_prefix = "Reading")
>
# A tibble: 5 x 4
# Groups: SerialNo [5]
SerialNo Reading1 Reading2 Reading3
<dbl> <dbl> <dbl> <dbl>
1 1 NA 2 5
2 2 NA 3 4
3 3 NA 1 9
4 4 NA 16 3
5 5 NA 7 2
Upvotes: 1