EGM8686
EGM8686

Reputation: 1572

Replace values in set of columns based on condition

I have a dataframe like this

id  v1  v2  v3  v4  v5  pos
1   11  12  11  10  10  3
2   17  11  22  40  23  4
1   11  22  50  10  10  2

I would like to change its values based on a condition related to pos to get:

id  v1  v2  v3  v4  v5  pos
1   11  12  12  12  12  3
2   17  11  22  22  22  4
1   11  11  11  11  11  2

So basically values get the previous value and the variable pos defines from where should we start.

Thx!

Upvotes: 1

Views: 56

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 389265

Using apply from base R

data.frame(t(apply(df, 1, function(x) 
     c(x[1:x["pos"]], rep(x[x["pos"]], ncol(df) - x["pos"] - 2), x['pos']))))

#  X1 X2 X3 X4 X5 X6
#1  1 11 12 12 12  3
#2  2 17 11 22 22  4
#3  1 11 11 11 11  2

Upvotes: 1

thelatemail
thelatemail

Reputation: 93938

An approach using some indexing, which should be efficient in running time.
Not super efficient in terms of memory however, due to making a copy the same size as the input data:

vars <- paste0("v",1:5)
nv <- dat[vars][cbind(seq_len(nrow(dat)), dat$pos-1)]
ow <- col(dat[vars]) >= dat$pos
dat[vars][ow] <- nv[row(ow)[ow]]

#  id v1 v2 v3 v4 v5 pos
#1  1 11 12 12 12 12   3
#2  2 17 11 22 22 22   4
#3  1 11 11 11 11 11   2

Explanation:

Get the variables of interest:

vars <- paste0("v",1:5)

Get the new values to overwrite for each row:

nv <- dat[vars][cbind(seq_len(nrow(dat)), dat$pos-1)]

Make a logical matrix of the cells to overwrite

ow <- col(dat[vars]) >= dat$pos

Overwrite the cells using a row identifier to pick the appropriate value.

dat[vars][ow] <- nv[row(ow)[ow]]

Quick comparative timing using a larger dataset:

dat <- dat[rep(1:3,1e6),]

# indexing
#   user  system elapsed 
#   1.36    0.31    1.68 

# apply
#   user  system elapsed 
#  77.30    0.83   78.41 

# gather/spread
#   user  system elapsed 
# 293.43    3.64  299.10

Upvotes: 2

www
www

Reputation: 39174

Here is one idea with gather and spread.

library(tidyverse)

dat2 <- dat %>%
  rowid_to_column() %>%
  gather(v, value, starts_with("v")) %>%
  group_by(rowid) %>%
  mutate(value = ifelse(row_number() >= (pos - 1), nth(value, (pos - 1)[[1]]), value)) %>%
  spread(v, value) %>%
  ungroup() %>%
  select(names(dat))

dat2
# # A tibble: 3 x 7
#      id    v1    v2    v3    v4    v5   pos
#   <int> <int> <int> <int> <int> <int> <int>
# 1     1    11    12    12    12    12     3
# 2     2    17    11    22    22    22     4
# 3     1    11    11    11    11    11     2

DATA

dat <- read.table(text = "id  v1  v2  v3  v4  v5  pos
1   11  12  11  10  10  3
2   17  11  22  40  23  4
1   11  22  50  10  10  2",
                  header = TRUE)
library(tidyverse)

Upvotes: 1

Related Questions