Reputation: 337
I have two lists (more exactly, character atomic vectors) that I want to compare using regular expressions to produce a sub-set of one of the lists. I can use a 'for' loop for this, but is there some simpler code? Following exemplifies my case:
# list of unique cities
city <- c('Berlin', 'Perth', 'Oslo')
# list of city-months, like 'New York-Dec'
temp <- c('Berlin-Jan', 'Delhi-Jan', 'Lima-Feb', 'Perth-Feb', 'Oslo-Jan')
# need sub-set of 'temp' for only 'Jan' month for only the items in 'city' list:
# 'Berlin-Jan', 'Oslo-Jan'
Added clarification: In the actual case that I am seeking code for, the values of the 'month' equivalent are more complex, and rather random alphanumeric values with only the first two characters having informational value of my interest (has to be '01').
Added actual case example:
# equivalent of 'city' in the first example
# values match pattern TCGA-[0-9A-Z]{2}-[0-9A-Z]{4}
patient <- c('TCGA-43-4897', 'TCGA-65-4897', 'TCGA-78-8904', 'TCGA-90-8984')
# equivalent of 'temp' in the first example
# values match pattern TCGA-[0-9A-Z]{2}-[0-9A-Z]{4}-[\d]{2}[0-9A-Z]+
sample <- c('TCGA-21-5732-01A333', 'TCGA-43-4897-01A159', 'TCGA-65-4897-01T76', 'TCGA-78-8904-11A70')
# sub-set wanted (must have '01' after the 'patient' ID part)
# 'TCGA-43-4897-01A159', 'TCGA-65-4897-01T76'
Upvotes: 2
Views: 238
Reputation: 118839
Something like this?
temp <- temp[grepl("Jan", temp)]
temp[sapply(strsplit(temp, "-"), "[[", 1) %in% city]
# [1] "Berlin-Jan" "Oslo-Jan"
Even better, borrowing the idea from @agstudy:
> temp[temp %in% paste0(city, "-Jan")]
# [1] "Berlin-Jan" "Oslo-Jan"
Edit: How about this?
> sample[gsub("(.*-01).*$", "\\1", sample) %in% paste0(patient, "-01")]
# [1] "TCGA-43-4897-01A159" "TCGA-65-4897-01T76"
Upvotes: 4
Reputation: 4603
Here's a solution after the others, with your new requirements:
sample[na.omit(pmatch(paste0(patient, '-01'), sample))]
Upvotes: 3
Reputation: 16026
Here's a solution with two partial string matches...
temp[agrep("Jan",temp)[which(agrep("Jan",temp) %in% sapply(city, agrep, x=temp))]]
# [1] "Berlin-Jan" "Oslo-Jan"
As a function just for fun...
fun <- function(x,y,pattern) y[agrep(pattern,y)[which(agrep(pattern,y) %in% sapply(x, agrep, x=y))]]
# x is a vector containing your data for filter
# y is a vector containing the data to filter on
# pattern is the quoted pattern you're filtering on
fun(temp, city, "Jan")
# [1] "Berlin-Jan" "Oslo-Jan"
Upvotes: 1
Reputation: 121588
You can use gsub
x <- gsub(paste(paste(city,collapse='-Jan|'),'-Jan',sep=''),1,temp)
> temp[x==1]
[1] "Berlin-Jan" "Oslo-Jan"
the pattern here is :
"Berlin-Jan|Perth-Jan|Oslo-Jan"
Upvotes: 2