Richard
Richard

Reputation: 1294

R: conditionally replace values in loop

I have a dataframe like so:

rel      <- c(2, 5, NA, 3, 6)
year.in  <- c(4, NA, 2, 3, 2)
year.out <- c(6, 7, NA, 5, 4)
year.1   <- c(NA, NA, NA, NA, NA)
year.2   <- c(NA, NA, NA, NA, NA)
year.3   <- c(NA, NA, NA, NA, NA)
year.4   <- c(NA, NA, NA, NA, NA)
year.5   <- c(NA, NA, NA, NA, NA) 

df <- as.data.frame(cbind(rel, year.in, year.out, year.1, year.2, year.3, 
                      year.4, year.5))

What I would like to do is update the missing values in year.1 - year.5 with the value of 'rel', but only if: (year.in >= year.i AND year.out <= year.i) (with i is 1:5)

Focussing on the just the year of entry, I came up with this:

for (i in 1:5) ifelse(df$year.in < i,
    df[paste("year", i, sep= ".")]<- NA,
    df[paste("year", i, sep= ".")]<- df["rel"])

But this merely replaces all year.i variables with the value of rel.

I have two questions:

Best and thanks in advance,

Richard

Upvotes: 0

Views: 10894

Answers (2)

Justin
Justin

Reputation: 43245

I'd melt your data using the reshape2 package:

library(reshape2)    
df.melt <- melt(df, id.vars=c('rel', 'year.in', 'year.out'))

Dig out the numeric year:

df.melt$year <- as.integer(gsub('year\\.', '', df.melt$variable))

Then use vectorized operations:

subsetter <- with(df.melt, year.in >= year & year.out <= year.out)
subsetter[is.na(subsetter)] <- FALSE
df.melt$value[subsetter] <- df.melt$rel[subsetter]

However, in your example, everything fails your condition.

Using ifelse is perfectly acceptable, however, don't do assignment inside. Instead assign its result to something, like below. The issue was that the two assignments you were performing inside the ifelse were not on each subset but instead acting as though they were just run independently.

for (i in 1:5) {
  year_col <- paste('year', i, sep='.')
  df[[year_col]] <- ifelse(df$year.in >= i & df$year.out <= i,
                         df$rel,
                         df[[year_col]])
}

To answer your bullets:

  • See above.

  • There is nothing specifically wrong with using ifelse and sometimes it is handy to do so for readability. However, it is a "looping" construct and thus can often be replaced by a more efficient vectorized solution.

Upvotes: 1

eddi
eddi

Reputation: 49448

library(data.table)
dt = data.table(df)

for(i in 1:5) dt[year.in <= i & i <= year.out, paste0('year.', i) := rel]

dt
#   rel year.in year.out year.1 year.2 year.3 year.4 year.5
#1:   2       4        6     NA     NA     NA      2      2
#2:   5      NA        7     NA     NA     NA     NA     NA
#3:  NA       2       NA     NA     NA     NA     NA     NA
#4:   3       3        5     NA     NA      3      3      3
#5:   6       2        4     NA      6      6      6     NA

Upvotes: 4

Related Questions