How to select columns containing different text combination from data frame?

Question

I have example data like below but my real data contains more than 10000 rows

IND  snp1      snp2    snp3     snp4    snp5
1    A/G       T/T     T/C      G/C     G/G
2    A/A       C/C     G/G      G/G     A/A
3    T/T       G/G     C/C      C/C     T/T

Now i want to subset only columns having these characters (A/T,A/G,A/C,T/A,T/G,T/C,G/A,G/T,G/C,C/A,C/T,C/G) after following the above condition I should subset snp1, snp3 and snp4 and I will get them by using this code

select(df, snp1, snp3, snp4)[1, 1:3]

this can happen with small number of columns but in my case I need to go through all 10000 columns for characters and subset them to different file. How can i do this in R is tidyverse package will be helpful for me or not? if yes please let me know how it will be. any help in this regard will be highly appreciated Thanks in advance.

Darren Tsai · Accepted Answer

You can use tidy selection in select().

library(dplyr)
cond <- c("A/T", "A/G", "A/C", "T/A", "T/G", "T/C", "G/A", "G/T", "G/C", "C/A", "C/T", "C/G")

df %>%
  select(starts_with("snp") & where(~ any(.x %in% cond)))

#   snp1 snp3 snp4
# 1  A/G  T/C  G/C
# 2  A/A  G/G  G/G
# 3  T/T  C/C  C/C

Data

df <- structure(list(IND = 1:3, snp1 = c("A/G", "A/A", "T/T"), snp2 = c("T/T", 
"C/C", "G/G"), snp3 = c("T/C", "G/G", "C/C"), snp4 = c("G/C", "G/G", "C/C"),
snp5 = c("G/G", "A/A", "T/T")), class = "data.frame", row.names = c(NA, -3L))

How to select columns containing different text combination from data frame?

Answers (1)

Related Questions