Reputation: 95
I have example data like below but my real data contains more than 10000 rows
IND snp1 snp2 snp3 snp4 snp5
1 A/G T/T T/C G/C G/G
2 A/A C/C G/G G/G A/A
3 T/T G/G C/C C/C T/T
Now i want to subset only columns having these characters (A/T,A/G,A/C,T/A,T/G,T/C,G/A,G/T,G/C,C/A,C/T,C/G)
after following the above condition I should subset snp1
, snp3
and snp4
and I will get them by using this code
select(df, snp1, snp3, snp4)[1, 1:3]
this can happen with small number of columns but in my case I need to go through all 10000 columns for characters and subset them to different file. How can i do this in R is tidyverse
package will be helpful for me or not? if yes please let me know how it will be. any help in this regard will be highly appreciated
Thanks in advance.
Upvotes: 3
Views: 115
Reputation: 35604
You can use tidy selection in select()
.
library(dplyr)
cond <- c("A/T", "A/G", "A/C", "T/A", "T/G", "T/C", "G/A", "G/T", "G/C", "C/A", "C/T", "C/G")
df %>%
select(starts_with("snp") & where(~ any(.x %in% cond)))
# snp1 snp3 snp4
# 1 A/G T/C G/C
# 2 A/A G/G G/G
# 3 T/T C/C C/C
Data
df <- structure(list(IND = 1:3, snp1 = c("A/G", "A/A", "T/T"), snp2 = c("T/T",
"C/C", "G/G"), snp3 = c("T/C", "G/G", "C/C"), snp4 = c("G/C", "G/G", "C/C"),
snp5 = c("G/G", "A/A", "T/T")), class = "data.frame", row.names = c(NA, -3L))
Upvotes: 2