Reputation: 63
This is definitely a novice question, but I'm stuck and cannot find help comparable online.. I am trying to compare two columns of a dataframe to create a third column. Here it's mydf I'd like to compare Distx and Disty. If there is a value in either I would like to keep it and place it in a new column Distz. If they are both "Missing" I'd like to just put "Missing" in Distz. Below is the dataframe I'd like to get.
ID <- c(1, 2, 3, 4, 5, 6)
Distx <- c("A", "B", "Missing", "Missing", "G", "Missing")
Disty <- c("Missing", "Missing", "C", "Missing", "Missing", "E")
mydf <- data.frame(ID, Distx, Disty, Distz)
mydf
ID Distx Disty Distz
1 1 A Missing A
2 2 B Missing B
3 3 Missing C C
4 4 Missing Missing Missing
5 5 G Missing G
6 6 Missing E E
Here is the code that does not work... At first I thought I wasn't indexing correctly, but then the 2nd code attempt below resulted the same.. There are no error messages but the results are 1's, not the actual values of the columns....?
for (i in seq(1:nrow(mydf))){
if (mydf$Distx[i] == "Missing" && mydf$Disty[i] != "Missing"){
mydf$Distz[i]<- mydf$Disty[i]}
if (mydf$Distx[i] != "Missing" && mydf$Disty[i] == "Missing"){
mydf$Distz[i]<- mydf$Distx[i]}
if (mydf$Distx[i] == "Missing" && mydf$Disty[i] == "Missing"){
mydf$Distz[i]<- "Missing"}
}
#for the purposes of readability I only ran two of the tests in this code
within(mydf, {
Distz <- ifelse(Distx == "Missing" & Disty != "Missing", Disty, ifelse(Distx != "Missing" & Disty == "Missing", Distx))
})
#Both results look like this ...???
ID Distx Disty Distz
1 1 A Missing 1
2 2 B Missing 1
3 3 Missing C 1
4 4 Missing Missing 1
5 5 G Missing 1
6 6 Missing E 1
Thanks in advance for any help
Upvotes: 3
Views: 133
Reputation: 887118
You could also do
indx <- mydf[-1]!='Missing'
mydf$Distz <- mydf[-1][cbind(1:nrow(mydf), max.col(indx))]
mydf
# ID Distx Disty Distz
#1 1 A Missing A
#2 2 B Missing B
#3 3 Missing C C
#4 4 Missing Missing Missing
#5 5 G Missing G
#6 6 Missing E E
NOTE: The columns that I used are 'character' class. You could create the 'data.frame' with stringsAsFactors=FALSE
so that the 'character' columns would not convert to 'factor' class. It is better to work with 'character' class instead of 'factor'
mydf <- structure(list(ID = c(1, 2, 3, 4, 5, 6), Distx = c("A", "B",
"Missing", "Missing", "G", "Missing"), Disty = c("Missing", "Missing",
"C", "Missing", "Missing", "E")), .Names = c("ID", "Distx", "Disty"
), row.names = c(NA, -6L), class = "data.frame")
Upvotes: 1
Reputation: 44525
You can try a nested ifelse
statement:
mydf$Distz <- with(mydf, ifelse(Distx == "Missing" & Disty == "Missing", "Missing",
ifelse(Distx != "Missing", as.character(Distx),
ifelse(Disty != "Missing", as.character(Disty), NA))))
mydf
# ID Distx Disty Distz
# 1 1 A Missing A
# 2 2 B Missing B
# 3 3 Missing C C
# 4 4 Missing Missing Missing
# 5 5 G Missing G
# 6 6 Missing E E
The problem you were running into with your code is that your variables are "factor" class, not "character" class, so the code was recording the factor "level" rather than the factor label. This is resolved above by using as.character()
to coerce the factors to character.
Upvotes: 1