Daniel Beltran
Daniel Beltran

Reputation: 55

R changing data frame columns value comparing with previous column

I got a big data base but for this exercise let's use this one:

    c-1    c-2   c-3  c-4  c-5 c-6  
   1-14  14-29  NA    NA   NA  NA
   60-90  CF    CF    CF   CF  CF
    C     1-14  14-29 CF   NA  NA
   60-90  CF    CF    CF   NA  NA

What i want is to change all the columns that got a CF in the column immediately before to a HF so it would look like this:

   c-1    c-2   c-3  c-4  c-5 c-6  
   1-14  14-29  NA    NA   NA  NA
   60-90  CF    HF    HF   HF  HF
    C     1-14  14-29 CF   NA  NA
   60-90  CF    HF    HF   NA  NA

I have tried to do it with an apply like this but it isn't working properly

function_a <- function (x) {ifelse(df[,i]=="CF" & df[,i-1]=="CF", "HF", df[,i])}    
new_df<- as.data.frame(lapply(length(df):1, function_a))

Please keep in mind I want to preserve the original value if the previous column is not CF, is there any easy way to do this? on R base?

Upvotes: 2

Views: 75

Answers (3)

thelatemail
thelatemail

Reputation: 93908

With a matrix of the same size as the original data, looking for "CF" cells that occur in a row after the first occurrence:

sel <- dat == "CF" & (!is.na(dat))
dat[col(dat) > max.col(sel, "first") & sel] <- "HF"
dat
#    c-1   c-2   c-3  c-4  c-5  c-6
#1  1-14 14-29  <NA> <NA> <NA> <NA>
#2 60-90    CF    HF   HF   HF   HF
#3     C  1-14 14-29   CF <NA> <NA>
#4 60-90    CF    HF   HF <NA> <NA>

Using @r2evans' dat object.

Upvotes: 3

chinsoon12
chinsoon12

Reputation: 25225

Another suggested solution is to transpose the matrix, lag the matrix by 1 row, then check for CF in the lagged matrix and CF in the transposed matrix and finally transpose back to get the desired output:

tdat <- t(dat)
lagged <- rbind(matrix(NA_character_, ncol=nrow(dat)), tdat[-nrow(tdat),])
tdat[lagged=="CF" & tdat=="CF"] <- "HF"
res <- t(tdat)
res

data:

dat <- read.table(text="c-1    c-2   c-3  c-4  c-5 c-6  
1-14  14-29  NA    NA   NA  NA
60-90  CF    CF    CF   CF  CF
C     1-14  14-29 CF   NA  NA
60-90  CF    CF    CF   NA  NA", header=TRUE, check.names=FALSE)

Upvotes: 3

r2evans
r2evans

Reputation: 160607

I liked this problem, since it is direct to think about it as a rolling apply of sorts, but not in the sense of zoo::rollapply (though it might be manageable there, too). I'm talking about the use of Reduce(..., accumulate=TRUE).

Your data:

dat <- data.frame(
  c1 = c('1-14','60-90','C','60-90'),
  c2 = c('14-29','CF','1-14','CF'),
  c3 = c(NA,'CF','14-29','CF'),
  c4 = c(NA,'CF','CF','CF'),
  c5 = c(NA,'CF',NA,NA),
  c6 = c(NA,'CF',NA,NA),
  stringsAsFactors = FALSE
)

I'm adding the logic to check for "HF" in the previous column, too, since otherwise you'll get alternating "CF"/"HF". I'm inferring that an NA should always be NA regardless of previous column, since that's how your example looks. (This assumption can easily be removed.)

func <- function(a, b) ifelse(! is.na(a) & ! is.na(b) & a %in% c("CF","HF"), "HF", b)

By default, Reduce will return the results of the last column only. With accumulate=TRUE, it returns the result of each call (plus the first column), which we will use to re-populate a same-sized frame.

df[] <- Reduce(func, df, right=FALSE, accumulate=TRUE)
df
#      c1    c2    c3   c4   c5   c6
# 1  1-14 14-29  <NA> <NA> <NA> <NA>
# 2 60-90    CF    HF   HF   HF   HF
# 3     C  1-14 14-29   CF <NA> <NA>
# 4 60-90    CF    HF   HF <NA> <NA>

Upvotes: 4

Related Questions