xxxRxxx
xxxRxxx

Reputation: 367

How do I fill in blank cells?

I have a csv file where some of the entries in some columns are blank. I have corresponding columns which have data which could be used to fill in the blank fields. Let's say one of the columns with blanks is called Old Info, and one of the columns with replacement information is called New Info. I don't want to replace Old Info with the New Info, I only want to fill in blanks in Old Info with data from New Info. Data would come from the same row, i.e. if Old Info Row 1 is blank, then information would be taken from New Info Row 1.

Additionally, I have a secondary column that also has replacement information, which could be called Secondary Replacement Info. If Old Info Row 1 is blank, and so is New Info Row 1, then I would want to replace Old Info Row 1 with Secondary Replacement Info Row 1. Here's some example data:

    Old Info     New Info   Secondary Replacement Info
1      Carl         Carl               Carl
2                   Diana              Diana
3      Jeremy       Jeremy             Jeremy
4                                      Jack

And here's the desired outcome:

    Old Info     New Info   Secondary Replacement Info
1      Carl         Carl               Carl
2      Diana        Diana              Diana
3      Jeremy       Jeremy             Jeremy
4      Jack                            Jack

So as you can see, the blanks in Old Info have been filled in. Row 2 was filled in by New Info, but Row 4 was filled in by Secondary Replacement Info, as New Info also had a blank. How would I write a function to accomplish all of this?

Upvotes: 1

Views: 1871

Answers (2)

Roland
Roland

Reputation: 132706

#import your data
#don't forget to set stringsAsFactors = FALSE
DF <- read.csv(text = "Old Info,New Info,Secondary Replacement Info
1,Carl,Carl,Carl
2,,Diana,Diana
3,Jeremy,Jeremy,Jeremy
4,,,Jack", stringsAsFactors = FALSE)

#a little function
fun <- function(x, y, z) {
  y[y == ""] <- z[y == ""] #substitute missings in y with values from z
  x[x == ""] <- y[x == ""] #substitute missings in x with values from y
  x #return
}

DF <- within(DF, Old.Info <- fun(Old.Info, New.Info, Secondary.Replacement.Info))
#  Old.Info New.Info Secondary.Replacement.Info
#1     Carl     Carl                       Carl
#2    Diana    Diana                      Diana
#3   Jeremy   Jeremy                     Jeremy
#4     Jack                                Jack

Upvotes: 3

Andy McKenzie
Andy McKenzie

Reputation: 456

This is a for loop, which is probably not that efficient, but it is a function and it should get the job done. I'm inputting the file that I used for input, which is a csv (like your text, but unlike your example input), so please let me know if I should switch that.

#read in csv file
data = read.csv("/Users/amckenz/Documents/github/R-plots/SO/replace_data.txt", fill = T, na.strings = "")

replace_missing_info <- function(data_df){

    for(i in 1:nrow(data_df)){
        print(i)
        if(is.na(data_df[i, 1])){
            if(!is.na(data_df[i, 2])){
                data_df[i, 1] = data_df[i, 2]
            } else if(!is.na(data_df[i, 3])){
                data_df[i, 1] = data_df[i, 3]
            }
        }
    }

    return(data_df)

}

updated_data = replace_missing_info(data_df = data)

Input data file:

Old Info,New Info,Secondary Replacement Info
Carl,Carl,Carl
,Diana,Diana
Jeremy,Jeremy,Jeremy
,,Jack

ETA: note that I read in the blank entries as NA's so you can use R's nice NA handling system, via the na.strings = "" argument. This could be adjusted if your input data is different.

Second edit (per comment): If you want to access columns by names instead of column number, you can use the following code for the function and function call instead:

replace_missing_info <- function(data_df, column_with_blanks, 
    primary_replacement_column, secondary_replacement_column){

    for(i in 1:nrow(data_df)){
        print(i)
        if(is.na(data_df[i, column_with_blanks])){
            if(!is.na(data_df[i, primary_replacement_column])){
                data_df[i, column_with_blanks] = data_df[i, primary_replacement_column]
            } else if(!is.na(data_df[i, secondary_replacement_column])){
                data_df[i, column_with_blanks] = data_df[i, secondary_replacement_column]
            }
        }
    }

    return(data_df)

}

updated_data = replace_missing_info(data_df = data, "Old.Info", "New.Info", "Secondary.Replacement.Info")

Upvotes: 1

Related Questions