Reputation: 1
I have a metadata file stored as a .tsv which I read into R and save as META
. I need to extract all rows containing a given string "male", here stored in variable sample
.
The full script has a lot of these operations and so it's important that I store the pattern in sample below. The errors are in the way I am trying to grep.
IN <- "/home/zchadva/Scratch/output/cov"
#metadata
META <- read.table("/home/zchadva/Scratch/data/hipsci/rnaseq/hipsci.qc1_sample_info.20160926.tsv", header = TRUE, sep = "\t")
#Set study/table variables
sample <- "\\<male\\>"
control <- "female"
#Grep all rows containing "male" from the table META
sample.list <- META[grep(sample, META, value=TRUE)]
Ideally I do not want to use META$Gender
to specify a coloumn each time I need to do a pattern search as our real metadata file is humungous. If I do need to specify, I would like to have Gender
in a variable
sample.list <- (META[grep(sample, META$Gender), ]
For example:
**coloumn** <- Gender
sample.list <- (META[grepl(sample, META$**coloumn**), ]
#Table example simplified
ID Disease Gender Cell
JX1 ibd male liver
PTY healthy male liver
HB3 ibd female brain
PO3 bbs male
#Desired layout in sample.list
JX1 ibd male liver
PTY healthy male liver
PO3 bbs male
Any Help Greatly Appreciated. I have tried to do this for hours
Upvotes: 0
Views: 2354
Reputation: 17369
grepl
will give you better results than grep
, since you can use the logical vector to index your data frame.
META <-
data.frame(ID = c("JX1", "PTY", "HB3", "PO3"),
Disease = c("ibd", "healthy", "ibd", "bbs"),
Gender = c("male", "male", "female", "male"),
Cell = c("liver", "liver", "brain", "liver"))
sample <- "male"
control <- "female"
META[grepl("^male", META$Gender), ]
ID Disease Gender Cell
1 JX1 ibd male liver
2 PTY healthy male liver
4 PO3 bbs male liver
Upvotes: 1