Reputation: 55
I got a big data base but for this exercise let's use this one:
c-1 c-2 c-3 c-4 c-5 c-6
1-14 14-29 NA NA NA NA
60-90 CF CF CF CF CF
C 1-14 14-29 CF NA NA
60-90 CF CF CF NA NA
What i want is to change all the columns that got a CF in the column immediately before to a HF so it would look like this:
c-1 c-2 c-3 c-4 c-5 c-6
1-14 14-29 NA NA NA NA
60-90 CF HF HF HF HF
C 1-14 14-29 CF NA NA
60-90 CF HF HF NA NA
I have tried to do it with an apply like this but it isn't working properly
function_a <- function (x) {ifelse(df[,i]=="CF" & df[,i-1]=="CF", "HF", df[,i])}
new_df<- as.data.frame(lapply(length(df):1, function_a))
Please keep in mind I want to preserve the original value if the previous column is not CF, is there any easy way to do this? on R base?
Upvotes: 2
Views: 75
Reputation: 93908
With a matrix of the same size as the original data, looking for "CF"
cells that occur in a row after the first occurrence:
sel <- dat == "CF" & (!is.na(dat))
dat[col(dat) > max.col(sel, "first") & sel] <- "HF"
dat
# c-1 c-2 c-3 c-4 c-5 c-6
#1 1-14 14-29 <NA> <NA> <NA> <NA>
#2 60-90 CF HF HF HF HF
#3 C 1-14 14-29 CF <NA> <NA>
#4 60-90 CF HF HF <NA> <NA>
Using @r2evans' dat
object.
Upvotes: 3
Reputation: 25225
Another suggested solution is to transpose the matrix, lag the matrix by 1 row, then check for CF in the lagged matrix and CF in the transposed matrix and finally transpose back to get the desired output:
tdat <- t(dat)
lagged <- rbind(matrix(NA_character_, ncol=nrow(dat)), tdat[-nrow(tdat),])
tdat[lagged=="CF" & tdat=="CF"] <- "HF"
res <- t(tdat)
res
data:
dat <- read.table(text="c-1 c-2 c-3 c-4 c-5 c-6
1-14 14-29 NA NA NA NA
60-90 CF CF CF CF CF
C 1-14 14-29 CF NA NA
60-90 CF CF CF NA NA", header=TRUE, check.names=FALSE)
Upvotes: 3
Reputation: 160607
I liked this problem, since it is direct to think about it as a rolling apply of sorts, but not in the sense of zoo::rollapply
(though it might be manageable there, too). I'm talking about the use of Reduce(..., accumulate=TRUE)
.
Your data:
dat <- data.frame(
c1 = c('1-14','60-90','C','60-90'),
c2 = c('14-29','CF','1-14','CF'),
c3 = c(NA,'CF','14-29','CF'),
c4 = c(NA,'CF','CF','CF'),
c5 = c(NA,'CF',NA,NA),
c6 = c(NA,'CF',NA,NA),
stringsAsFactors = FALSE
)
I'm adding the logic to check for "HF"
in the previous column, too, since otherwise you'll get alternating "CF"
/"HF"
. I'm inferring that an NA
should always be NA
regardless of previous column, since that's how your example looks. (This assumption can easily be removed.)
func <- function(a, b) ifelse(! is.na(a) & ! is.na(b) & a %in% c("CF","HF"), "HF", b)
By default, Reduce
will return the results of the last column only. With accumulate=TRUE
, it returns the result of each call (plus the first column), which we will use to re-populate a same-sized frame.
df[] <- Reduce(func, df, right=FALSE, accumulate=TRUE)
df
# c1 c2 c3 c4 c5 c6
# 1 1-14 14-29 <NA> <NA> <NA> <NA>
# 2 60-90 CF HF HF HF HF
# 3 C 1-14 14-29 CF <NA> <NA>
# 4 60-90 CF HF HF <NA> <NA>
Upvotes: 4