Reputation: 59

R data.frame add a column depending on row-values

In R, I have a data.frame that looks like this:

now, i want a new Colum, lets call it "SumX", that adds two following values of X into a new field of that SumX column, and one that does the same to "SumY" column. So the result data.frame would look like this:

X   Y   SumX                 SumY
20  7   20   #first row = X  7   #first row = Y
25  84  45   #X0 + X1        91  #Y0 + Y1
15  62  40   #X1 + X2        146 #Y1 + Y2
22  12  37   #X2 + X3        74  #Y2 + Y3
60  24  82   #X3 + X4        36  #Y3 + Y4
40  10  100  #X4 + X5        34  #Y4 + Y5
60  60  100  #and so on      70  #and so on
12  50  72                   110
11  17  23                   67

I can do simple X + Y into a new column with

myFrame$SumXY <- with(myFrame, X+Y)

but it there a simple way to add two X (n + (n-1)) values into SumX, and two Y (n + (n-1)) into SumY? Even if it is with a while-loop, though i would prefer a simpler way (its a lot of data like this). Any help is much appreciated! (I'm still pretty new to R)

Upvotes: 1

Answers (4)

Simon Jackson

Reputation: 3174

Here's a dplyr approach.

Use mutate() to add a new colum and var + lag(var, default = 0) to compute your variable. Example:

library(dplyr)

d <- data.frame(
  x = 1:10,
  y = 11:20,
  z = 21:30
)

mutate(d, sumx = x + lag(x, default = 0))

#>     x  y  z sumx
#> 1   1 11 21    1
#> 2   2 12 22    3
#> 3   3 13 23    5
#> 4   4 14 24    7
#> 5   5 15 25    9
#> 6   6 16 26   11
#> 7   7 17 27   13
#> 8   8 18 28   15
#> 9   9 19 29   17
#> 10 10 20 30   19

More variables can be handled similarly:

mutate(d, sumx = x + lag(x, default = 0), sumy = y + lag(y, default = 0))
#>     x  y  z sumx sumy
#> 1   1 11 21    1   11
#> 2   2 12 22    3   23
#> 3   3 13 23    5   25
#> 4   4 14 24    7   27
#> 5   5 15 25    9   29
#> 6   6 16 26   11   31
#> 7   7 17 27   13   33
#> 8   8 18 28   15   35
#> 9   9 19 29   17   37
#> 10 10 20 30   19   39

If you know that you want to do this for many, or even EVERY column in your data frame, then here's a standard evaluation approach with mutate_() that uses a custom function I adapted from this blog post (note you need to have the lazyeval package installed). The function gets applied to each column in a for loop (which could probably be optimised).

f <- function(df, col, new_col_name) {
  mutate_call <- lazyeval::interp(~ x + lag(x, default = 0), x = as.name(col))
  df %>% mutate_(.dots = setNames(list(mutate_call), new_col_name))
}

for (var in names(d)) {
  d <- f(d, var, paste0('sum', var))
}

d
#>     x  y  z sumx sumy sumz
#> 1   1 11 21    1   11   21
#> 2   2 12 22    3   23   43
#> 3   3 13 23    5   25   45
#> 4   4 14 24    7   27   47
#> 5   5 15 25    9   29   49
#> 6   6 16 26   11   31   51
#> 7   7 17 27   13   33   53
#> 8   8 18 28   15   35   55
#> 9   9 19 29   17   37   57
#> 10 10 20 30   19   39   59

Just to continue the tidyverse theme, here's a solution using the purrr package (again, works for all columns, but can subset columns if need to):

library(purrr)

# Create new columns in new data frame.
# Subset `d` here if only want select columns
sum_d <- map_df(d, ~ . + lag(., default = 0))

# Set names correctly and 
# bind back to original data
names(sum_d) <- paste0("sum", names(sum_d))
d <- cbind(d, sum_d)
d
#>     x  y  z sumx sumy sumz
#> 1   1 11 21    2   22   42
#> 2   2 12 22    4   24   44
#> 3   3 13 23    6   26   46
#> 4   4 14 24    8   28   48
#> 5   5 15 25   10   30   50
#> 6   6 16 26   12   32   52
#> 7   7 17 27   14   34   54
#> 8   8 18 28   16   36   56
#> 9   9 19 29   18   38   58
#> 10 10 20 30   20   40   60

Upvotes: 2

amccnnll

Reputation: 397

The rollapply function from the zoo package will work here.

The following code block will create the rolling sum of each 2 adjacent values.

require(zoo)
myFrame$SumX <- rollapply(myFrame$X, 2, sum) # this is a rolling sum of every 2 values

You could add by = 2 as an argument to rollapply in order to not have a rolling sum (i.e. it sums values 1+2, then 3+4, then 5+6 etc.).

Look up ?rollapply for more info.

Upvotes: 3

d.b

Reputation: 32548

#SumX
cumsum(df$X) - c(0, 0, cumsum(df$X)[1:(nrow(df)-2)])
#[1]  20  45  40  37  82 100 100  72  23

#SumY
cumsum(df$Y) - c(0, 0, cumsum(df$Y)[1:(nrow(df)-2)])
#[1]   7  91 146  74  36  34  70 110  67

Upvotes: 1

salient

Reputation: 2486

You can use the lag function to achieve something like this:

myFrame$SumX[1] <- X[1]
myFrame$SumX[2:nrow(myFrame)] <- X[2:nrow(myFrame)]+lag(X)[2:nrow(myFrame)]

Upvotes: 1

R data.frame add a column depending on row-values

Answers (4)

Related Questions