DSSS
DSSS

Reputation: 2051

How to transform character vector to regular expression for dataframe indexing

I have a following data frame:

 df<- structure(list(ID = c(9000099L, 9000296L, 9000622L, 9000798L, 
 9001104L, 9001400L), VERSION = structure(c(1L, 1L, 1L, 1L, 1L, 
 1L), .Label = "1.2.1", class = "factor"), V01SF1 = c(1L, 2L, 
2L, 3L, 2L, 1L), V01SF2 = c(3L, 3L, 3L, 3L, 3L, 3L), V01BD1 = c(2L, 
3L, 3L, 2L, 3L, 3L), V01BD2 = c(5L, 5L, 5L, 3L, 5L, 5L)), .Names = c("ID", 
 "VERSION", "V01SF1", "V01SF2", "V01BD1", "V01BD2"), row.names = c(NA, 
6L), class = "data.frame")

    > df
       ID VERSION V01SF1 V01SF2 V01BD1 V01BD2
1 9000099   1.2.1      1      3      2      5
2 9000296   1.2.1      2      3      3      5
3 9000622   1.2.1      2      3      3      5
4 9000798   1.2.1      3      3      2      3
5 9001104   1.2.1      2      3      3      5
6 9001400   1.2.1      1      3      3      5

I would like to index this data frame with "VERSION" column and columns containing SF and DF in their name. I have a vector which elements I would like to use as patterns for search in df names:

   vars<- c ("SF", "DF")

I perform indexing for VERSION very easily:

 df [grep ("SION", names (df), value =T)]


   VERSION
   1   1.2.1
   2   1.2.1
   3   1.2.1
   4  1.2.1
   5   1.2.1
   6   1.2.1

How can I add to grep ("SION", names (df), value =T) elements from vector vars<- c ("SF", "DF") as grep patterns? The resulting code should work as df [grep ("SION|SF|BD", names (df), value =T)] giving the following output:

   VERSION V01SF1 V01SF2 V01BD1 V01BD2
 1   1.2.1      1      3      2      5
 2   1.2.1      2      3      3      5
 3   1.2.1      2      3      3      5
 4   1.2.1      3      3      2      3
 5   1.2.1      2      3      3      5
 6   1.2.1      1      3      3      5

Thank you very much

Upvotes: 1

Views: 114

Answers (3)

thelatemail
thelatemail

Reputation: 93813

Like this:

vars <- c("SF","BD")
vars
#[1] "SF" "BD"

df[grepl(paste(c("SION",vars),collapse="|"),names(df))]

#  VERSION V01SF1 V01SF2 V01BD1 V01BD2
#1   1.2.1      1      3      2      5
#2   1.2.1      2      3      3      5
#3   1.2.1      2      3      3      5
#4   1.2.1      3      3      2      3
#5   1.2.1      2      3      3      5
#6   1.2.1      1      3      3      5

Upvotes: 2

Ricardo Oliveros-Ramos
Ricardo Oliveros-Ramos

Reputation: 4339

Try this:

vars<- c ("SF", "BD")
version = "VERSION"

pattern = paste(c(version, vars), collapse="|")

> pattern
[1] "VERSION|SF|BD"

ind = grep(pattern, names(df), value=TRUE)

> ind
[1] "VERSION" "V01SF1"  "V01SF2"  "V01BD1"  "V01BD2" 

The trick comes from the fact the first argument of grep is just a character vector, containing a regular expresion. So, you can construct your regular expresion using paste properly. Now you can index your data.frame.

dfx = df[, ind]


> dfx
  VERSION V01SF1 V01SF2 V01BD1 V01BD2
1   1.2.1      1      3      2      5
2   1.2.1      2      3      3      5
3   1.2.1      2      3      3      5
4   1.2.1      3      3      2      3
5   1.2.1      2      3      3      5
6   1.2.1      1      3      3      5

Upvotes: 3

G. Grothendieck
G. Grothendieck

Reputation: 269481

First define s as:

s <- c("SION", vars)

Now try:

g <- sapply(s, grepl, names(df))
df[ apply(g, 1, any) ]

or

df[ unlist(sapply(s, grep, names(df))) ]

or

df[ unlist(Vectorize(function(s) grep(s, names(df)))(s)) ]

or

pat <- paste(s, collapse = "|")
df[ grepl(pat, names(df)) ]

Upvotes: 1

Related Questions