YefR
YefR

Reputation: 369

Removing certain rows and replacing values based on a condition

I have the following data:

set.seed(2)
d <- data.frame(iteration=c(1,1,2,2,2,3,4,5,6,6,6),
            value=sample(11),
            var3=sample(11))
      iteration value var3
1          1     3     7
2          1     8     4
3          2     6     8
4          2     2     3
5          2     7     9
6          3     9     11
7          4     1     10
8          5     4     1
9          6    10     2
10         6    11     6
11         6     5     5

Now, I want the following: 1. IF there are more than one iteration to remove the last row AND replace the value of the last row with the previous value. So in the example above here is the output that I want:

d<-data.frame(iteration=c(1,2,2,3,4,5,6,6),
          value=c(8,6,7,9,1,4,10,5))

     iteration value var3
1         1     8     7
2         2     6     8
3         2     7     3
4         3     9     11
5         4     1     10
6         5     4     1
7         6    10     2
8         6     5     6

Upvotes: 1

Views: 59

Answers (2)

lmo
lmo

Reputation: 38510

This base R solution using the split-apply-combine methodology returns the same values as @akrun's data.table version, although the logic appears to be different.

do.call(rbind, lapply(split(d, d$iteration),
                      function(i)
                       if(nrow(i) >= 3) i[-(nrow(i)-1),] else tail(i, 1)))
     iteration value
1            1     8
2.3          2     6
2.5          2     7
3            3     9
4            4     1
5            5     4
6.9          6    10
6.11         6     5

The idea is to split the data.frame into a list of data.frames along iteration, then for each data.frame, check if there are more than 2 rows, if yes, grab the first and final row, if no, then return only the final row. do.call with rbind then compiles these observations into a single data.frame.

Note that this will not work in the presence of other variables.

Upvotes: 2

akrun
akrun

Reputation: 887213

We can use data.table

library(data.table)
setDT(d)[, .(value = if(.N>1) c(value[seq_len(.N-2)], value[.N]) else value), iteration]
#   iteration value
#1:         1     8
#2:         2     6
#3:         2     7
#4:         3     9
#5:         4     1
#6:         5     4
#7:         6    10
#8:         6     5

Update

Based on the update in OP's post, we can first create a new column with the lead values in 'value', assign the 'value1' to 'value' only for those meet the conditions in 'i1', then subset the rows

setDT(d)[, value1 := shift(value, type = "lead"), iteration]
i1 <- d[, if(.N >1) .I[.N-1], iteration]$V1 
d[i1, value := value1]
d[d[, if(.N > 1) .I[-.N] else .I, iteration]$V1][, value1 := NULL][]
#   iteration value var3
#1:         1     8    7
#2:         2     6    8
#3:         2     7    3
#4:         3     9   11
#5:         4     1   10
#6:         5     4    1
#7:         6    10    2
#8:         6     5    6

Upvotes: 3

Related Questions