user466663
user466663

Reputation: 845

How to extract all columns from a dataframe based upon partial column names in a vector using R

I have a dataframe and a vector. The vector has about about 20 string values, which correspond to part of the column names in the dataframe. The dataframe has several hundred column names. I have to subset the dataframe based upon the partial column names present in the vector.

For example, if one of the column names in the dataframe is GRP20R.45.M, one of the values in the vector will be GRP20R

Thanks

Upvotes: 2

Views: 1886

Answers (2)

agstudy
agstudy

Reputation: 121568

Assuming that v.names is your vector of names, you can use grepl to filter using and aggregating pattern:

patt <- sub(',\\s','|',(toString(v.names)))
id.group <- grepl(patt,colnames(df))
df[,id.group]

here an example:

v.names <- c('GRP20R','GRP20KA')
df <- data.frame(GRP20R.45.M=1,GRP20KA.25.8=2,hh=1)
patt <- sub(',\\s','|',(toString(v.names)))
id.group <- grepl(patt,colnames(df))
df[,id.group]

 GRP20R.45.M GRP20KA.25.8
1           1            2

where df is :

df
  GRP20R.45.M GRP20KA.25.8 hh
1           1            2  1

EDIT a liner solution (thanks @thelatemail)

df[,grepl(paste0(v.names,collapse="|"),colnames(df))]

Upvotes: 4

thelatemail
thelatemail

Reputation: 93813

Test data:

dat <-  data.frame(
          GRP20R.30.M="a",
          GRP20R.45.M="a",
          GRP40R.30.M="b",
          GRP40R.45.M="b",
          GRP60R.30.M="c",
          GRP60R.45.M="c"
        )

Only extract the columns partially matching the below strings:

strings <- c("GRP20R","GRP60R")

If your column names all had a predictable stem length, you could use:

dat[substr(colnames(dat),1,6) %in% strings]

If you wanted to more flexibly compare the part of the column name before the first period ., you could use:

dat[gsub("(.)?\\..+","\\1",colnames(dat)) %in% strings]

Both options giving the result:

  GRP20R.30.M GRP20R.45.M GRP60R.30.M GRP60R.45.M
1           a           a           c           c

Upvotes: 2

Related Questions