Emma Tebbs
Emma Tebbs

Reputation: 1467

using grep with multiple entries in r to find matching strings

If I have a vector of strings:

dd <- c("sflxgrbfg_sprd_2011","sflxgrbfg_sprd2_2011","sflxgrbfg_sprd_2012")

and want to find the entires with '2011' in the string I can use

ifiles <- dd[grep("2011",dd)]

How do I search for entries with a combination of strings included, without using a loop?

For example, I would like to find the entries with both '2011' and 'sprd' in the string, which in this case will only return

sflxgrbfg_sprd_2011

How can this be done? I could define a variable

toMatch <- c('2011','sprd)

and then loop through the entries but I was hoping there was a better solution?

Note: To make this useful for different strings. Is it also possible to to determine which entries have these strings without them being in the order shown. For example, 'sflxlgrbfg_2011_sprd'

Upvotes: 1

Views: 151

Answers (3)

Mikko Marttila
Mikko Marttila

Reputation: 11878

If you want a scalable solution, you can use lapply, Reduce and intersect to:

  1. For each expression in toMatch, find the indices of all matches in dd.
  2. Keep only those indices that are found for all expressions in toMatch.
dd <- c("sflxgrbfg_sprd_2011","sflxgrbfg_sprd2_2011","sflxgrbfg_sprd_2012")
dd <- c(dd, "sflxgrbfh_sprd_2011")
toMatch <- c('bfg', '2011','sprd')

dd[Reduce(intersect, lapply(toMatch, grep, dd))]
#> [1] "sflxgrbfg_sprd_2011"  "sflxgrbfg_sprd2_2011"

Created on 2018-03-07 by the reprex package (v0.2.0).

Upvotes: 0

akrun
akrun

Reputation: 887048

Try

  grep('2011_sprd|sprd_2011', dd, value=TRUE)
 #[1] "sflxgrbfg_sprd_2011"  "sflxlgrbfg_2011_sprd"

Or using an example with more patterns

 grep('(?<=sprd_).*(?=2011)|(?<=2011_).*(?=sprd)', dd1,
             value=TRUE, perl=TRUE)
 #[1] "sflxgrbfg_sprd_2011"       "sflxlgrbfg_2011_sprd"     
 #[3] "sfxl_2011_14334_sprd"      "sprd_124334xsff_2011_1423"

data

dd <- c("sflxgrbfg_sprd_2011","sflxgrbfg_sprd2_2011","sflxgrbfg_sprd_2012", 
"sflxlgrbfg_2011_sprd")

dd1 <- c(dd,  "sfxl_2011_14334_sprd", "sprd_124334xsff_2011_1423")

Upvotes: 2

cole
cole

Reputation: 586

If you want to find more than one pattern, try indexing with a logical value rather than the number. That way you can create an "and" condition, where only the string with both patterns will be extracted.

ifiles <- dd[grepl("2011",dd) & grepl("sprd_",dd)]

Upvotes: 3

Related Questions