Reputation: 1294
I have a dataframe like so:
rel <- c(2, 5, NA, 3, 6)
year.in <- c(4, NA, 2, 3, 2)
year.out <- c(6, 7, NA, 5, 4)
year.1 <- c(NA, NA, NA, NA, NA)
year.2 <- c(NA, NA, NA, NA, NA)
year.3 <- c(NA, NA, NA, NA, NA)
year.4 <- c(NA, NA, NA, NA, NA)
year.5 <- c(NA, NA, NA, NA, NA)
df <- as.data.frame(cbind(rel, year.in, year.out, year.1, year.2, year.3,
year.4, year.5))
What I would like to do is update the missing values in year.1 - year.5 with the value of 'rel', but only if: (year.in >= year.i AND year.out <= year.i) (with i is 1:5)
Focussing on the just the year of entry, I came up with this:
for (i in 1:5) ifelse(df$year.in < i,
df[paste("year", i, sep= ".")]<- NA,
df[paste("year", i, sep= ".")]<- df["rel"])
But this merely replaces all year.i variables with the value of rel.
I have two questions:
how can I update the year.i variables with the 'rel' values on the conditions mentioned?
is it bad to use the if else statement here?
Best and thanks in advance,
Richard
Upvotes: 0
Views: 10894
Reputation: 43245
I'd melt
your data using the reshape2
package:
library(reshape2)
df.melt <- melt(df, id.vars=c('rel', 'year.in', 'year.out'))
Dig out the numeric year:
df.melt$year <- as.integer(gsub('year\\.', '', df.melt$variable))
Then use vectorized operations:
subsetter <- with(df.melt, year.in >= year & year.out <= year.out)
subsetter[is.na(subsetter)] <- FALSE
df.melt$value[subsetter] <- df.melt$rel[subsetter]
However, in your example, everything fails your condition.
Using ifelse
is perfectly acceptable, however, don't do assignment inside. Instead assign its result to something, like below. The issue was that the two assignments you were performing inside the ifelse
were not on each subset but instead acting as though they were just run independently.
for (i in 1:5) {
year_col <- paste('year', i, sep='.')
df[[year_col]] <- ifelse(df$year.in >= i & df$year.out <= i,
df$rel,
df[[year_col]])
}
To answer your bullets:
See above.
There is nothing specifically wrong with using ifelse
and sometimes it is handy to do so for readability. However, it is a "looping" construct and thus can often be replaced by a more efficient vectorized solution.
Upvotes: 1
Reputation: 49448
library(data.table)
dt = data.table(df)
for(i in 1:5) dt[year.in <= i & i <= year.out, paste0('year.', i) := rel]
dt
# rel year.in year.out year.1 year.2 year.3 year.4 year.5
#1: 2 4 6 NA NA NA 2 2
#2: 5 NA 7 NA NA NA NA NA
#3: NA 2 NA NA NA NA NA NA
#4: 3 3 5 NA NA 3 3 3
#5: 6 2 4 NA 6 6 6 NA
Upvotes: 4