sacrebleucarbon
sacrebleucarbon

Reputation: 13

Combining data frames in R without overwriting data with blanks

RStudio Version 1.3.959 (what I'm using), R version 4.0.2, MacOS Catalina, version 10.15.5.

Maybe I'm overthinking this, but I haven't been able to find a function that can do what I need: I have two data frames with 533 columns and 10,338 rows, both with identical formatting. Values are missing from df_all_data (values labeled as outlier) that I have in df_outliers, which doesn't contain the values from df_all_data. I would like to combine, somehow, df_all_data and df_outliers without the blanks in df_outliers overwriting the values in the cells in df_all_data. For example, in Excel, in special paste you can say ignore blanks. All that I want to overwrite are the cells filled with outlier which mark where the data from df_outliers go. There are true missing data which is why there are NAs. Condensed example data frames are below:

df_outliers = data.frame(Reference.Mass = c(256.2402292, 257.0324221, 257.0357941), GC1 = c(436955360, "", NA), GC2 = c(480996256, "", ""), GC3 = c(386362944, "", NA))

Reference.Mass       GC1       GC2       GC3
1       256.2402 436955360 480996256 386362944
2       257.0324                              
3       257.0358      <NA>                <NA>

df_all_data = data.frame(Reference.Mass = c(256.2402292, 257.0324221, 257.0357941), GC1 = c("outlier", 6109980, NA), GC2 = c("outlier", 7437798, 2721256), GC3 = c("outlier", 8958061, NA))

Reference.Mass     GC1     GC2     GC3
1       256.2402 outlier outlier outlier
2       257.0324 6109980 7437798 8958061
3       257.0358    <NA> 2721256    <NA>

I've tried merge, cbind, full_join, left_join. None have been able to overlay df_outliers without overwriting df_all_data. There are roughly 4,000 outlier values that need to be inserted into df_all_data in the correct row and column. I don't want to add any rows or columns. If anyone can shed any light on whether this is possible or how to format it using one of these functions, or if there are any other options, that would be much appreciated. Thank you!

Upvotes: 1

Views: 429

Answers (1)

akrun
akrun

Reputation: 887501

If we want to overwrite, create a logical matrix and use that to assign

i1 <- df_all_data == 'outlier' & !is.na(df_all_data)
df_all_data[i1] <- df_outliers[i1]

Upvotes: 1

Related Questions