patrick
patrick

Reputation: 380

Lag of the value, take the previous value logic

Here is what I try to achieve on whatiwant column:

df1 <- data.frame(value = c(99.99,99.98,99.97,99.96,99.95,99.94,
                            99.93,99.92,99.91,99.9,99.9,99.9),
                  new_value = c(NA,NA,99.98,NA,99.97,NA,
                                NA,NA,NA,NA,NA,NA),
                  whatiswant = c(99.99,99.96,99.98,99.95,99.97,99.94,
                                 99.93,99.92,99.91,99.9,99.9,99.9))

To explain it with words whatiswant should have the value of new_value and for those not having the new_value, it should take the next lowest value available.

I think it is kind of a lag stuff. Here is the data.frame:

   value new_value whatiswant
1  99.99        NA      99.99
2  99.98        NA      99.96
3  99.97     99.98      99.98
4  99.96        NA      99.95
5  99.95     99.97      99.97
6  99.94        NA      99.94
7  99.93        NA      99.93
8  99.92        NA      99.92
9  99.91        NA      99.91
10 99.90        NA      99.90
11 99.90        NA      99.90
12 99.90        NA      99.90

EDIT: Logic explained in following steps:

  1. Step 1. if new_value is not NA then col3 takes the value. So the 3rd and 5th row are sorted.
  2. Step 2. 1st row col3 takes the value of col1, as col2 is NA.
  3. Step 3. 2nd row col3 takes the value of col1-row4, as 2nd and 3nd row values for col1 is already used in Step 1.
  4. Step 4. 4th row col3 takes the value of col1-row5, as all above rows from col1 are taken in previous steps.
  5. Step 5. The rest of the rows6-12 in col3 take the same value from col1-rows6-12 as col2 is NA and non of the numbers col1-row6-12 are used in previous steps.

Upvotes: 0

Views: 169

Answers (1)

Tensibai
Tensibai

Reputation: 15784

In form of a function, each step in comment, ask if it's unclear:

t1 <- function(df) {
  df[,'whatiswant'] <- df[,'new_value'] # step 1, use value of new_value
  sapply(1:nrow(df),function(row) { # loop on each row
    x <- df[row,] # take the row, just to use a single var instead later
    ret <- unlist(x['whatiswant']) # initial value
    if(is.na(ret)) { # If empty
      if (x['value'] %in% df$whatiswant) { # test if corresponding value is already present
        ret <- df$value[!df$value %in% df$whatiswant][1] # If yes take the first value not present
      } else {
        ret <- unlist(x['value']) # if not take this value
      }
    }
    if(is.na(ret)) ret <- min(df$value) # No value left, take the min
    df$whatiswant[row] <<- ret # update the df from outside sapply so the next presence test is ok.
  })
  return(df) # return the updated df
}

Output:

>df1[,3] <- NA # Set last column to NA
> res <- t1(df1)
> res
   value new_value whatiswant
1  99.99        NA      99.99
2  99.98        NA      99.96
3  99.97     99.98      99.98
4  99.96        NA      99.95
5  99.95     99.97      99.97
6  99.94        NA      99.94
7  99.93        NA      99.93
8  99.92        NA      99.92
9  99.91        NA      99.91
10 99.90        NA      99.90
11 99.90        NA      99.90
12 99.90        NA      99.90

Upvotes: 2

Related Questions