Reputation: 71
I have a large dataframe of over 122000 rows and 60 columns, but simplified this is what the dataframe looks like:
structure(list(mz = c(40, 50, 60, 70, 80, 90),
`sample 1` = c(NA, 51, NA, NA, 675, 12),
`sample 2` = c(NA, 51, NA, NA, 2424, 5),
`Sample 3` = c(NA, 51, NA, 300, 1241, NA),
`Blank Average` = c(10, 20, 50, 78, NA, 0.00333333),
row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
What I want to do: I want the function I am writing, to create a new data frame where a row is removed in case ALL SAMPLE COLUMNS return NA.
I tried subsetting the entirety of sample columns first:
sample_cols <- grep("sample", names(dataframe),ignore.case = TRUE)
Consecutively in order to delete rows when ONLY these subsetted sample columns ALL returned NA I tried: na_omit -> this does not work, as this deletes the rows, but also deletes the rows with just one value NA and not all values in that row of samples.
I also tried:
Sample_cols_df<- dataframe[sample_cols] #Sample_cols are all the sample columns
Row_filtered<-Sample_cols_df[rowSums(is.na(Sample_cols_df)) != ncol(Sample_cols_df),
But I did not really understand this solution too well as I'm unfamiliar with rowSums and still new to R. I did end up with the right rows deleted with this code, BUT this method also removed the columns that were not sample columns in the process of making it work.
**In short:
-> For reference: In my example dataframe provided above, rows 1 and 3 should be removed, as all sample values are NA, eventhough the mz and Blank average are not. Row 4 for example should not be removed, as one of the sample values returns a result and no NA.
I already noticed a lot of topics on this on StackOverflow, but after a day of searching and trying, I can't seem to find a topic that exactly matches what I want to do. In case anyone has any ideas please let me know!
Upvotes: 1
Views: 70