Neal Barsch
Neal Barsch

Reputation: 2940

Filter dataframe by vector of column names and constant column names

This is surely easy but for the life of me I can't find the right syntax.

I want to keep all "ID_" columns, regardless of the number of columns and the numbers attached, and keep other columns by constant name.

Something like the below command that doesn't work (on the recreated data, every time):

###Does not work, but shows what I am trying to do
testdf1 <- df1[,c(paste(idvec, collapse="','"),"ConstantNames_YESwant")]

Recreated data:

rand <- sample(1:2, 1)
if(rand==1){
  df1 <- data.frame(
    ID_0=0,
    ID_1=1,
    ID_2=11,
    ID_3=111,
    LotsOfColumnsWithVariousNames_NOwant="unwanted_data",
    ConstantNames_YESwant="wanted_data",
    stringsAsFactors = FALSE
  )
  desired.df1 <- data.frame(
    ID_0=0,
    ID_1=1,
    ID_2=11,
    ID_3=111,
    ConstantNames_YESwant="wanted_data",
    stringsAsFactors = FALSE
  )
}
if(rand==2){
  df1 <- data.frame(
    ID_0=0,
    ID_1=1,
    LotsOfColumnsWithVariousNames_NOwant="unwanted_data",
    ConstantNames_YESwant="wanted_data",
    stringsAsFactors = FALSE
  )
  desired.df1 <- data.frame(
    ID_0=0,
    ID_1=1,
    ConstantNames_YESwant="wanted_data",
    stringsAsFactors = FALSE
  )
}

Upvotes: 2

Views: 3859

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388982

In base R , you could do

#Get all the ID columns
idvec <- grep("ID", colnames(df1), value = TRUE)
#Select ID columns and the constant names you want. 
df1[c(idvec, "ConstantNames_YESwant")]

#  ID_0 ID_1 ConstantNames_YESwant
#1    0    1           wanted_data

Upvotes: 3

Tung
Tung

Reputation: 28371

Is this what you want?

library(tidyverse)

df1 %>% 
  select(matches("ID_*"), ConstantNames_YESwant)

df1 %>% 
  select(starts_with("ID"), ConstantNames_YESwant)

# ID_0 ID_1 ConstantNames_YESwant
# 1    0    1           wanted_data

Upvotes: 4

Related Questions