Stupid_Intern
Stupid_Intern

Reputation: 3450

Calculation involving Interdependent columns in dataframe

Data <- data.frame(
  A = rnorm(1:10),
  B = rnorm(1:10),
  C = rnorm(1:10),
  D = rnorm(1:10),
  X = sample(c(1,0), size=10, replace = TRUE)
)

In the Data dataframe I need to create two columns Y and Z where

First observation of Y will be equal to a constant let's say 10

Second and subsequent observations of Y i.e from row 2 to 10 in this case will be equal = 10 + Z[previous observation]

example

if Z[1] = 10 then Y[2] = 10 + Z[1] = 20

if Z[2] = 20 then Y[3] = 10 + Z[2] = 30

if Z[3] = 50 then Y[4] = 10 + Z[3] = 60

and so on till the length of Data dataframe.


Now as far as Z is concerned it is calculated as follows:

First observation of Z is IF(X1>0,0,Y1-SUM(A1+B1+C1+D1))
Second observation of Z is IF(X2>0,0,Y2-SUM(A2+B2+C2+D2))
Third observation of Z is IF(X3>0,0,Y3-SUM(A3+B3+C3+D3))

i.e it checks if X observation from same row is > 0 it is then Z for that row is 0 otherwise it is Y1 - sum(A:D) for the same row

and so on till length of Data frame.

Upvotes: 3

Views: 119

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389055

As there is an interdependency in every row on the previous row used for loop here

Data$Y <- 10
Data$Z <- 0

for (i in seq_len(nrow(Data))) {
   if(i > 1)
    Data$Y[i] <- Data$Y[1] + Data$Z[i-1]

   if(Data$X[i] > 0) 
      Data$Z[i] <- 0 
   else
      Data$Z[i] <- with(Data, Y[i] - (A[i]+B[i]+C[i]+D[i]))
}

Data
#            A           B          C          D X        Y         Z
#1   0.6607697  0.05785971  0.2053518 -0.3366816 0 10.00000  9.412700
#2  -2.0529830  1.10397550  1.0162833 -0.6592560 0 19.41270 20.004681
#3  -1.4992061 -0.02561697  0.4089990 -0.1007670 1 30.00468  0.000000
#4   1.4712331  0.51484639 -0.7052355  0.7039078 0 10.00000  8.015248
#5   1.4591385  0.99005668  0.2356143  0.4380549 0 18.01525 14.892384
#6   0.1401390  0.30345432  0.3451300  1.4971479 0 24.89238 22.606512
#7   0.2091844 -0.93007223 -1.0197996  0.2162444 1 32.60651  0.000000
#8  -3.0360898  0.08403068 -1.4118223 -1.4953378 0 10.00000 15.859219
#9  -0.4869341  0.52677963 -1.3638423  0.1249940 1 25.85922  0.000000
#10 -1.0878673  0.01586862  0.8731049 -1.0380945 0 10.00000 11.236988

data

set.seed(234)
Data <- data.frame(
   A = rnorm(1:10),
   B = rnorm(1:10),
   C = rnorm(1:10),
   D = rnorm(1:10),
   X = sample(c(1,0), size=10, replace = TRUE)
)

Upvotes: 3

Related Questions