Triamus
Triamus

Reputation: 2505

R: Conditional Replacement of values in several columns in data frame

I have read several posts on this but they all applied to only changing one column/variable. I need to replace values in a number of columns in a dataframe at once. I thought this should work but it's not and I cannot figure out why.

positive <- c("Yes", "Science")
temp1 <- c("Yes", "No","","Science", "Only-Child")
temp2 <- c("Yes", "No",""," Yay people!", "Pessimist")
temp3 <- cbind(temp1,temp2)
colnames(temp3) <- c("Feature1","Feature2")
temp <- as.data.frame(temp3)

This does not work:

for (i in temp) {
  ifelse(i %in% positive, 1, i)
}

However, doing it on one column works:

test <- ifelse(temp$Feature1 %in% positive, 1, temp$Feature1)
test

So I suspect the i is not what I want it to be but a check results in what I expected:

for (i in temp) {
  print(i %in% positive)
}

The output should look like this:

  Feature1     Feature2
         1            1
        No           No

         1  Yay people!
Only-Child    Pessimist

So what am I missing?

Upvotes: 2

Views: 3429

Answers (3)

sedsiv
sedsiv

Reputation: 547

My answer is based on assumptions of what you asked, since you didn't specify what exactly it is you want the result to be.

Your loop tries return ifelse(temp$Feature_i %in% positive, 1, temp$Feature_i) for all i. However the code will try to return a vector with either 1 or the respective "column" of temp for each "column". This will not work, since ifelse is a vectorized function, meaning it can - as opposed to the if statement - support a vector of boolean variables as input (+1 for the question). But since each vectorized function returns a vector, all values within this vector will be of the same class (R does the conversion automatically). In your case temp$Feature_i is a vector of factors and the respective conversion to numeric is done by the index of the factor within the vector. Thus I am not able to understand your ifelse query.

If you want to change exactly those inputs in temp which contain positive and you want to know which elements to change (if that's your intention) then you'd have to start from the following (use sapply as that is usually faster then for loops):

sapply(temp, function(x) x %in% positive)
     Feature1 Feature2
[1,]     TRUE     TRUE
[2,]    FALSE    FALSE
[3,]    FALSE    FALSE
[4,]     TRUE    FALSE
[5,]    FALSE    FALSE

However if you strictly need the output you suggested in your third code block then do

sapply(temp, function(x) ifelse(x %in% positive,1,x))

Hth, D


The solution is as follows:

sapply(temp, function(x) ifelse(x %in% positive,1,as.character(x)))

Upvotes: 1

alexwhan
alexwhan

Reputation: 16026

The first thing that's causing problems in your example is the conversion of strings to factors. Assuming that's fixed, here's a way to get the appropriate indices and assign 1 to them:

temp <- as.data.frame(temp3, stringsAsFactors=FALSE)
temp[apply(temp, 2, function(x) x %in% positive)] <- 1

Upvotes: 1

coffeinjunky
coffeinjunky

Reputation: 11514

There probably is a scoping issue in the for-loop. Try

test <- (temp == "Yes" | temp == "Science")

(I assume you want true or false statements as output, right? If not, it might be good to add an example of how you want your final dataframe to look like.)

EDIT:

Converting it into a matrix first seems to help. Try:

ind <- (temp == "Yes" | temp == "Science")
tmp <- as.matrix(temp)
tmp[ind] <- 1

Upvotes: 0

Related Questions