Reputation: 2690
I have a dataframe with information on several genes in a format similar to:
chr start end Gene Region
1 100 110 Bat Exon
1 120 130 Bat Intron
1 500 550 Ball Upstream, Downstream
1 590 600 Ball Intron, Upstream
1 900 980 Mit Promoter, Upstream
I would like to subset the data to remove any rows that contains genes that have "Exon" or "Promoter" in the Regions column. I had been using:
Regions <- subset(Table, Region == "Intron" | Region== "DownStream" | Region =="Upstream" | Region=="DownStream,Upstream")
However this gives me:
chr start end Gene Region
1 120 130 Bat Intron
1 500 550 Ball Upstream, Downstream
1 590 600 Ball Intron, Upstream
What I want is:
chr start end Gene Region
1 500 550 Ball Upstream, Downstream
1 590 600 Ball Intron, Upstream
Upvotes: 0
Views: 234
Reputation: 70256
Try this using grepl
:
df[!grepl("Exon|Promoter", df$Region),]
# chr start end Gene Region
#2 1 120 130 Bat Intron
#3 1 500 550 Ball Upstream, Downstream
#4 1 590 600 Ball Intron, Upstream
It's not clear to me why you want the row 2 with "Intron" removed as well. Please explain that.
Think I understood now, try this instead:
temp <- df$Gene[grepl("Exon|Promoter", df$Region)]
df[!df$Gene %in% temp,]
# chr start end Gene Region
#3 1 500 550 Ball Upstream, Downstream
#4 1 590 600 Ball Intron, Upstream
Upvotes: 2