Reputation: 845
I have a dataframe and a vector. The vector has about about 20 string values, which correspond to part of the column names in the dataframe. The dataframe has several hundred column names. I have to subset the dataframe based upon the partial column names present in the vector.
For example, if one of the column names in the dataframe is GRP20R.45.M, one of the values in the vector will be GRP20R
Thanks
Upvotes: 2
Views: 1886
Reputation: 121568
Assuming that v.names
is your vector of names, you can use grepl
to filter using and aggregating pattern:
patt <- sub(',\\s','|',(toString(v.names)))
id.group <- grepl(patt,colnames(df))
df[,id.group]
here an example:
v.names <- c('GRP20R','GRP20KA')
df <- data.frame(GRP20R.45.M=1,GRP20KA.25.8=2,hh=1)
patt <- sub(',\\s','|',(toString(v.names)))
id.group <- grepl(patt,colnames(df))
df[,id.group]
GRP20R.45.M GRP20KA.25.8
1 1 2
where df is :
df
GRP20R.45.M GRP20KA.25.8 hh
1 1 2 1
EDIT a liner solution (thanks @thelatemail)
df[,grepl(paste0(v.names,collapse="|"),colnames(df))]
Upvotes: 4
Reputation: 93813
Test data:
dat <- data.frame(
GRP20R.30.M="a",
GRP20R.45.M="a",
GRP40R.30.M="b",
GRP40R.45.M="b",
GRP60R.30.M="c",
GRP60R.45.M="c"
)
Only extract the columns partially matching the below strings:
strings <- c("GRP20R","GRP60R")
If your column names all had a predictable stem length, you could use:
dat[substr(colnames(dat),1,6) %in% strings]
If you wanted to more flexibly compare the part of the column name before the first period .
, you could use:
dat[gsub("(.)?\\..+","\\1",colnames(dat)) %in% strings]
Both options giving the result:
GRP20R.30.M GRP20R.45.M GRP60R.30.M GRP60R.45.M
1 a a c c
Upvotes: 2