How to extract all columns from a dataframe based upon partial column names in a vector using R

Question

I have a dataframe and a vector. The vector has about about 20 string values, which correspond to part of the column names in the dataframe. The dataframe has several hundred column names. I have to subset the dataframe based upon the partial column names present in the vector.

For example, if one of the column names in the dataframe is GRP20R.45.M, one of the values in the vector will be GRP20R

Thanks

agstudy · Accepted Answer

Assuming that v.names is your vector of names, you can use grepl to filter using and aggregating pattern:

patt <- sub(',\s','|',(toString(v.names)))
id.group <- grepl(patt,colnames(df))
df[,id.group]

here an example:

v.names <- c('GRP20R','GRP20KA')
df <- data.frame(GRP20R.45.M=1,GRP20KA.25.8=2,hh=1)
patt <- sub(',\s','|',(toString(v.names)))
id.group <- grepl(patt,colnames(df))
df[,id.group]

 GRP20R.45.M GRP20KA.25.8
1           1            2

where df is :

df
  GRP20R.45.M GRP20KA.25.8 hh
1           1            2  1

EDIT a liner solution (thanks @thelatemail)

df[,grepl(paste0(v.names,collapse="|"),colnames(df))]

How to extract all columns from a dataframe based upon partial column names in a vector using R

Answers (2)

Related Questions