Reputation: 25
I like to replace blank cells (" ") in a column with "no". The missing entries do have a meaning for me (no score determined yet) and I like to use the factor variable in a regression tree later.
I found a similar question here (Replace blank cells with character) and tried the following, but then the blank cells are converted to NA and not as text:
> Test$SCORE[Test$SCORE==" "]<- "no"
Warning message:
In `[<-.factor`(`*tmp*`, Test$SCORE == " ", value = c(NA, NA, 8L, :
invalid factor level, NA generated
Is there a way to avoid NA and use my own text?
Please see example data "Test":
ID Score
1. A
2. " "
3. B
4. " "
5. C
Is there a way to avoid NA and use my own text? This is the result I like to achieve:
ID Score
1 A
2 "no"
3 B
4 "no"
5 C
The dataset is very large therefore a manual solution via indexing specific rows is quite time consuming. I appreciate your help because R is quite new for me.
Thank you very much in advance.
Additional info:
str(Test$SCORE) Factor w/ 13 levels " ","A","B","C",..
Please excuse the format of the example table, but this is my first question.
Upvotes: 2
Views: 12057
Reputation: 3833
> df <- data.frame(Test=1:5,Score=c("A"," ","B"," "," "))
> df
Test Score
1 1 A
2 2
3 3 B
4 4
5 5
> df[,2] <- as.character(df$Score)
> is.character(df[,2])
[1] TRUE
> df$Score[df$Score==" "] <- "No"
> df
Test Score
1 1 A
2 2 No
3 3 B
4 4 No
5 5 No
Upvotes: 0
Reputation: 132706
Work on the factor levels:
DF <- read.table(text = 'ID Score
1. A
2. " "
3. B
4. " "
5. C', header = TRUE)
levels(DF$Score)[levels(DF$Score) == " "] <- "no"
# ID Score
#1 1 A
#2 2 no
#3 3 B
#4 4 no
#5 5 C
This is very efficient since there are usually far less factor levels than elements in your vector.
Upvotes: 7