Kris
Kris

Reputation: 63

I'm comparing two columns to create a third...it's not working?

This is definitely a novice question, but I'm stuck and cannot find help comparable online.. I am trying to compare two columns of a dataframe to create a third column. Here it's mydf I'd like to compare Distx and Disty. If there is a value in either I would like to keep it and place it in a new column Distz. If they are both "Missing" I'd like to just put "Missing" in Distz. Below is the dataframe I'd like to get.

    ID <- c(1, 2, 3, 4, 5, 6)
    Distx <- c("A", "B", "Missing", "Missing", "G", "Missing")
    Disty <- c("Missing", "Missing", "C", "Missing", "Missing", "E")

    mydf <- data.frame(ID, Distx, Disty, Distz) 
    mydf

     ID   Distx   Disty   Distz
    1  1       A Missing       A
    2  2       B Missing       B
    3  3 Missing       C       C
    4  4 Missing Missing Missing
    5  5       G Missing       G
    6  6 Missing       E       E

Here is the code that does not work... At first I thought I wasn't indexing correctly, but then the 2nd code attempt below resulted the same.. There are no error messages but the results are 1's, not the actual values of the columns....?

    for (i in seq(1:nrow(mydf))){
       if (mydf$Distx[i] == "Missing" && mydf$Disty[i] != "Missing"){
         mydf$Distz[i]<- mydf$Disty[i]}
       if (mydf$Distx[i] != "Missing" && mydf$Disty[i] == "Missing"){
        mydf$Distz[i]<- mydf$Distx[i]}
       if (mydf$Distx[i] == "Missing" && mydf$Disty[i] == "Missing"){
        mydf$Distz[i]<- "Missing"}
    }

    #for the purposes of readability I only ran two of the tests in this code
    within(mydf, {
      Distz <- ifelse(Distx == "Missing" & Disty != "Missing", Disty,          ifelse(Distx != "Missing" & Disty == "Missing", Distx))
    })

    #Both results look like this ...???

      ID   Distx   Disty Distz
    1  1       A Missing     1
    2  2       B Missing     1
    3  3 Missing       C     1
    4  4 Missing Missing     1
    5  5       G Missing     1
    6  6 Missing       E     1

Thanks in advance for any help

Upvotes: 3

Views: 133

Answers (2)

akrun
akrun

Reputation: 887118

You could also do

 indx <- mydf[-1]!='Missing'
 mydf$Distz <- mydf[-1][cbind(1:nrow(mydf), max.col(indx))]
 mydf
 #  ID   Distx   Disty   Distz
 #1  1       A Missing       A
 #2  2       B Missing       B
 #3  3 Missing       C       C
 #4  4 Missing Missing Missing
 #5  5       G Missing       G
 #6  6 Missing       E       E

NOTE: The columns that I used are 'character' class. You could create the 'data.frame' with stringsAsFactors=FALSE so that the 'character' columns would not convert to 'factor' class. It is better to work with 'character' class instead of 'factor'

data

mydf <-  structure(list(ID = c(1, 2, 3, 4, 5, 6), Distx = c("A", "B", 
"Missing", "Missing", "G", "Missing"), Disty = c("Missing", "Missing", 
"C", "Missing", "Missing", "E")), .Names = c("ID", "Distx", "Disty"
), row.names = c(NA, -6L), class = "data.frame")

Upvotes: 1

Thomas
Thomas

Reputation: 44525

You can try a nested ifelse statement:

mydf$Distz <- with(mydf, ifelse(Distx == "Missing" & Disty == "Missing", "Missing", 
                           ifelse(Distx != "Missing", as.character(Distx), 
                             ifelse(Disty != "Missing", as.character(Disty), NA))))
mydf
#   ID   Distx   Disty   Distz
# 1  1       A Missing       A
# 2  2       B Missing       B
# 3  3 Missing       C       C
# 4  4 Missing Missing Missing
# 5  5       G Missing       G
# 6  6 Missing       E       E

The problem you were running into with your code is that your variables are "factor" class, not "character" class, so the code was recording the factor "level" rather than the factor label. This is resolved above by using as.character() to coerce the factors to character.

Upvotes: 1

Related Questions