Reputation: 367
I have a csv file where some of the entries in some columns are blank. I have corresponding columns which have data which could be used to fill in the blank fields. Let's say one of the columns with blanks is called Old Info, and one of the columns with replacement information is called New Info. I don't want to replace Old Info with the New Info, I only want to fill in blanks in Old Info with data from New Info. Data would come from the same row, i.e. if Old Info Row 1 is blank, then information would be taken from New Info Row 1.
Additionally, I have a secondary column that also has replacement information, which could be called Secondary Replacement Info. If Old Info Row 1 is blank, and so is New Info Row 1, then I would want to replace Old Info Row 1 with Secondary Replacement Info Row 1. Here's some example data:
Old Info New Info Secondary Replacement Info
1 Carl Carl Carl
2 Diana Diana
3 Jeremy Jeremy Jeremy
4 Jack
And here's the desired outcome:
Old Info New Info Secondary Replacement Info
1 Carl Carl Carl
2 Diana Diana Diana
3 Jeremy Jeremy Jeremy
4 Jack Jack
So as you can see, the blanks in Old Info have been filled in. Row 2 was filled in by New Info, but Row 4 was filled in by Secondary Replacement Info, as New Info also had a blank. How would I write a function to accomplish all of this?
Upvotes: 1
Views: 1871
Reputation: 132706
#import your data
#don't forget to set stringsAsFactors = FALSE
DF <- read.csv(text = "Old Info,New Info,Secondary Replacement Info
1,Carl,Carl,Carl
2,,Diana,Diana
3,Jeremy,Jeremy,Jeremy
4,,,Jack", stringsAsFactors = FALSE)
#a little function
fun <- function(x, y, z) {
y[y == ""] <- z[y == ""] #substitute missings in y with values from z
x[x == ""] <- y[x == ""] #substitute missings in x with values from y
x #return
}
DF <- within(DF, Old.Info <- fun(Old.Info, New.Info, Secondary.Replacement.Info))
# Old.Info New.Info Secondary.Replacement.Info
#1 Carl Carl Carl
#2 Diana Diana Diana
#3 Jeremy Jeremy Jeremy
#4 Jack Jack
Upvotes: 3
Reputation: 456
This is a for loop, which is probably not that efficient, but it is a function and it should get the job done. I'm inputting the file that I used for input, which is a csv (like your text, but unlike your example input), so please let me know if I should switch that.
#read in csv file
data = read.csv("/Users/amckenz/Documents/github/R-plots/SO/replace_data.txt", fill = T, na.strings = "")
replace_missing_info <- function(data_df){
for(i in 1:nrow(data_df)){
print(i)
if(is.na(data_df[i, 1])){
if(!is.na(data_df[i, 2])){
data_df[i, 1] = data_df[i, 2]
} else if(!is.na(data_df[i, 3])){
data_df[i, 1] = data_df[i, 3]
}
}
}
return(data_df)
}
updated_data = replace_missing_info(data_df = data)
Input data file:
Old Info,New Info,Secondary Replacement Info
Carl,Carl,Carl
,Diana,Diana
Jeremy,Jeremy,Jeremy
,,Jack
ETA: note that I read in the blank entries as NA's so you can use R's nice NA handling system, via the na.strings = "" argument. This could be adjusted if your input data is different.
Second edit (per comment): If you want to access columns by names instead of column number, you can use the following code for the function and function call instead:
replace_missing_info <- function(data_df, column_with_blanks,
primary_replacement_column, secondary_replacement_column){
for(i in 1:nrow(data_df)){
print(i)
if(is.na(data_df[i, column_with_blanks])){
if(!is.na(data_df[i, primary_replacement_column])){
data_df[i, column_with_blanks] = data_df[i, primary_replacement_column]
} else if(!is.na(data_df[i, secondary_replacement_column])){
data_df[i, column_with_blanks] = data_df[i, secondary_replacement_column]
}
}
}
return(data_df)
}
updated_data = replace_missing_info(data_df = data, "Old.Info", "New.Info", "Secondary.Replacement.Info")
Upvotes: 1