Raul Gonzales
Raul Gonzales

Reputation: 906

Match Function overwritting everything in R

i have this a DF called Test1 and this is how it looks:

Circuit_ID Site_County
001        Yorkshire
002        NA
003        London
004        NA

and i have this DF Called FLR

Circuit_ID Site_County
001        NA
002        Manchester
003        NA
004        Newcastle

What i need to do is fill in the NA values on the Test1 DF without overwritting the values that are already there so this is how i need the Test1 DF to look like:

Circuit_ID Site_County
001        Yorkshire
002        Manchester
003        London
004        Newcastle

I have written the following code:

for (i in 1:nrow(Test1)) {
if (is.na(Test1$Site_County)) {
    Test1$Site_County = FLR.Cramer.Join[match(Test1$Circuit_ID,
                    FLR.Cramer.Join$Circuit_ID), "Site_County"]
}
else {

  }
}

But what the code does is overwrites the whole of the Test1 DF for the data on the FLR DF. I want to keep which ever data is already on the Site_County col of the Test1 DF and only fill in the NA values with the data from the FLR Site_County DF?

Upvotes: 0

Views: 37

Answers (2)

Raul Gonzales
Raul Gonzales

Reputation: 906

I realized that a for loop would not do the trick here so after a sleepless night i managed to get it to work with a simple Ifelse function:

Test1$Site_County = ifelse(is.na(Test1$Site_County),
FLR[match(Test1$Circuit_Reference,FLR$Circuit_Reference), "Site_County"], 
as.character(Test1$Site_County))

if my code is inefficient please let me know so that i can improve my skills further. Thanks Ken S for your original answer though!

Upvotes: 0

KenHBS
KenHBS

Reputation: 7164

You could use merge() for this, while only regarding the cases that do not have NAs in FLR. Also, by first merging Test1 with FLR, you can afterwards remove any duplicate Circuit_ID that have an NA in their Site_County.

rough_df <- merge( Test1, 
                   FLR[complete.cases(FLR$Site_County), ], all=TRUE )
rough_df[!duplicated(rough_df$Circuit_ID), ]
#  Circuit_ID Site_County
#1          1   Yorkshire
#2          2  Manchester
#3          3      London
#4          4   Newcastle

Side information on indexing: I used complete.cases() and duplicated() to index the data frames. Both functions return TRUE/FALSE vectors that I like to use to 'select' only those rows/columns that I need.

Upvotes: 1

Related Questions