Reputation: 2505
I have read several posts on this but they all applied to only changing one column/variable. I need to replace values in a number of columns in a dataframe at once. I thought this should work but it's not and I cannot figure out why.
positive <- c("Yes", "Science")
temp1 <- c("Yes", "No","","Science", "Only-Child")
temp2 <- c("Yes", "No",""," Yay people!", "Pessimist")
temp3 <- cbind(temp1,temp2)
colnames(temp3) <- c("Feature1","Feature2")
temp <- as.data.frame(temp3)
This does not work:
for (i in temp) {
ifelse(i %in% positive, 1, i)
}
However, doing it on one column works:
test <- ifelse(temp$Feature1 %in% positive, 1, temp$Feature1)
test
So I suspect the i is not what I want it to be but a check results in what I expected:
for (i in temp) {
print(i %in% positive)
}
The output should look like this:
Feature1 Feature2
1 1
No No
1 Yay people!
Only-Child Pessimist
So what am I missing?
Upvotes: 2
Views: 3429
Reputation: 547
My answer is based on assumptions of what you asked, since you didn't specify what exactly it is you want the result to be.
Your loop tries return ifelse(temp$Feature_i %in% positive, 1, temp$Feature_i)
for all i
. However the code will try to return a vector with either 1
or the respective "column" of temp
for each "column". This will not work, since ifelse is a vectorized function, meaning it can - as opposed to the if
statement - support a vector of boolean variables as input (+1 for the question). But since each vectorized function returns a vector, all values within this vector will be of the same class (R does the conversion automatically). In your case temp$Feature_i
is a vector of factors and the respective conversion to numeric is done by the index of the factor within the vector. Thus I am not able to understand your ifelse
query.
If you want to change exactly those inputs in temp
which contain positive
and you want to know which elements to change (if that's your intention) then you'd have to start from the following (use sapply
as that is usually faster then for loops):
sapply(temp, function(x) x %in% positive)
Feature1 Feature2
[1,] TRUE TRUE
[2,] FALSE FALSE
[3,] FALSE FALSE
[4,] TRUE FALSE
[5,] FALSE FALSE
However if you strictly need the output you suggested in your third code block then do
sapply(temp, function(x) ifelse(x %in% positive,1,x))
Hth, D
The solution is as follows:
sapply(temp, function(x) ifelse(x %in% positive,1,as.character(x)))
Upvotes: 1
Reputation: 16026
The first thing that's causing problems in your example is the conversion of strings to factors. Assuming that's fixed, here's a way to get the appropriate indices and assign 1 to them:
temp <- as.data.frame(temp3, stringsAsFactors=FALSE)
temp[apply(temp, 2, function(x) x %in% positive)] <- 1
Upvotes: 1
Reputation: 11514
There probably is a scoping issue in the for-loop. Try
test <- (temp == "Yes" | temp == "Science")
(I assume you want true or false statements as output, right? If not, it might be good to add an example of how you want your final dataframe to look like.)
EDIT:
Converting it into a matrix first seems to help. Try:
ind <- (temp == "Yes" | temp == "Science")
tmp <- as.matrix(temp)
tmp[ind] <- 1
Upvotes: 0