Ama
Ama

Reputation: 71

Remove rows in a dataframe if ALL values in a selection of columns returns NA as result

I have a large dataframe of over 122000 rows and 60 columns, but simplified this is what the dataframe looks like:

structure(list(mz = c(40, 50, 60, 70, 80, 90), 
`sample 1` = c(NA, 51, NA, NA, 675, 12), 
`sample 2` = c(NA, 51, NA, NA, 2424, 5),
`Sample 3` = c(NA, 51, NA, 300, 1241, NA), 
`Blank Average` = c(10, 20, 50, 78, NA, 0.00333333),
row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

What I want to do: I want the function I am writing, to create a new data frame where a row is removed in case ALL SAMPLE COLUMNS return NA.

I tried subsetting the entirety of sample columns first:

sample_cols <- grep("sample", names(dataframe),ignore.case = TRUE)

Consecutively in order to delete rows when ONLY these subsetted sample columns ALL returned NA I tried: na_omit -> this does not work, as this deletes the rows, but also deletes the rows with just one value NA and not all values in that row of samples.

I also tried:

 Sample_cols_df<- dataframe[sample_cols] #Sample_cols are all the sample columns
  Row_filtered<-Sample_cols_df[rowSums(is.na(Sample_cols_df)) != ncol(Sample_cols_df),

But I did not really understand this solution too well as I'm unfamiliar with rowSums and still new to R. I did end up with the right rows deleted with this code, BUT this method also removed the columns that were not sample columns in the process of making it work.

**In short:

-> For reference: In my example dataframe provided above, rows 1 and 3 should be removed, as all sample values are NA, eventhough the mz and Blank average are not. Row 4 for example should not be removed, as one of the sample values returns a result and no NA.

I already noticed a lot of topics on this on StackOverflow, but after a day of searching and trying, I can't seem to find a topic that exactly matches what I want to do. In case anyone has any ideas please let me know!

Upvotes: 1

Views: 70

Answers (1)

akrun
akrun

Reputation: 887118

We can use

df1[!rowSums(!is.na(df1[sample_cols])),]

Upvotes: 2

Related Questions