Reputation: 13
RStudio Version 1.3.959 (what I'm using), R version 4.0.2, MacOS Catalina, version 10.15.5.
Maybe I'm overthinking this, but I haven't been able to find a function that can do what I need:
I have two data frames with 533 columns and 10,338 rows, both with identical formatting. Values are missing from df_all_data
(values labeled as outlier
) that I have in df_outliers
, which doesn't contain the values from df_all_data. I would like to combine, somehow, df_all_data and df_outliers
without the blanks in df_outliers
overwriting the values in the cells in df_all_data
. For example, in Excel, in special paste you can say ignore blanks
. All that I want to overwrite are the cells filled with outlier
which mark where the data from df_outliers
go. There are true missing data which is why there are NAs. Condensed example data frames are below:
df_outliers = data.frame(Reference.Mass = c(256.2402292, 257.0324221, 257.0357941), GC1 = c(436955360, "", NA), GC2 = c(480996256, "", ""), GC3 = c(386362944, "", NA))
Reference.Mass GC1 GC2 GC3
1 256.2402 436955360 480996256 386362944
2 257.0324
3 257.0358 <NA> <NA>
df_all_data = data.frame(Reference.Mass = c(256.2402292, 257.0324221, 257.0357941), GC1 = c("outlier", 6109980, NA), GC2 = c("outlier", 7437798, 2721256), GC3 = c("outlier", 8958061, NA))
Reference.Mass GC1 GC2 GC3
1 256.2402 outlier outlier outlier
2 257.0324 6109980 7437798 8958061
3 257.0358 <NA> 2721256 <NA>
I've tried merge
, cbind
, full_join
, left_join
. None have been able to overlay df_outliers
without overwriting df_all_data
. There are roughly 4,000 outlier values that need to be inserted into df_all_data
in the correct row and column. I don't want to add any rows or columns. If anyone can shed any light on whether this is possible or how to format it using one of these functions, or if there are any other options, that would be much appreciated. Thank you!
Upvotes: 1
Views: 429
Reputation: 887501
If we want to overwrite, create a logical matrix and use that to assign
i1 <- df_all_data == 'outlier' & !is.na(df_all_data)
df_all_data[i1] <- df_outliers[i1]
Upvotes: 1