millie0725
millie0725

Reputation: 393

Removing columns that contain values for only 1 row

I have a dataframe with many columns that give compound values for sample IDs. I'm looking to remove any columns that only appear in one sample ID, therefore keeping columns in which compounds are present in at least 2 sample IDs. Here is a mock dataframe:

df1 <- data.frame(ID = c("A","B","C","D","E","F","G","H","I"),
                  Cmpd_1 = c(5.7,0,0,0,2.5,2.1,0,6.2,1.5),
                  Cmpd_2 = c(0,0,1,0,2.8,0,0,0,0),
                  Cmpd_3 = c(0,0,3.5,0,0,0,0,0,0))

In this example, Cmpd_3 only appears for sample C and I would therefore like the entire column to be removed. Here's what the ideal output would be:

 ID Cmpd_1 Cmpd_2
  A    5.7    0.0
  B    0.0    0.0
  C    0.0    1.0
  D    0.0    0.0
  E    2.5    2.8
  F    2.1    0.0
  G    0.0    0.0
  H    6.2    0.0
  I    1.5    0.0

Upvotes: 0

Views: 49

Answers (2)

akrun
akrun

Reputation: 887851

Using dplyr

library(dplyr)
df1 %>%
    select(where(~ sum(.x != 0, na.rm = TRUE) > 1))

Upvotes: 1

L&#233;on Ipdjian
L&#233;on Ipdjian

Reputation: 818

This should work if IDs are unique :

df2 <- df1[,colSums(df1!=0)>1]

Upvotes: 3

Related Questions