Reputation: 169

subset columns based on column names

I have a df1 with ids

df1 <- read.table(text="ID
8765
                    1879
                    8706
                    1872
                    0178
                    0268
                    0270
                    0269
                    0061
                    0271", header=T)

second df2 with columns names

> names(df2)
 [1] "TW_3784.IT"   "TW_3970.IT"   "TW_1879.IT"   "TW_0178.IT"   "SF_0271.IT" "TW_3782.IT"  
 [7] "TW_3783.IT"   "TW_8765.IT"   "TW_8706.IT"   "SF_0268.IT" "SF_0270.IT" "SF_0269.IT"
[13] "SF_0061.IT"

What i need is to keep only columns from df2 that partial match with df1

code

using dplyr

df3 = df2 %>% 
  dplyr::select(df2 , dplyr::contains(df1$ID))
error

Error in dplyr::contains(df1$ID) : is_string(match) is not TRUE

using grepl

df3 = df2[,grepl(df1$ID, names(df2))]

error
In grepl(df1$ID, names(df2)) :
  argument 'pattern' has length > 1 and only the first element will be used

Upvotes: 0

Answers (3)

Dan

Reputation: 12074

Here's a solution that uses the dplyr package.

df2 %>% select(matches(paste(df1$ID, collapse = "|")))

This pastes together the IDs from df1 with | as a separator (meaning logical OR) like this:

"8765|1879|8706|1872|178|268|270|269|61|271"

This is needed as matches then looks for columns names that matches one OR another of these numbers and these columns are then selected. dplyr is needed for select, matches and also %>%.

Upvotes: 1

milan

Reputation: 4970

As there is a clear pattern in the column names, you can use substr to extract each 4 digit ID. Convert it to a numeric to remove leading zeros. Use which to identify the column numbers that you want to keep.

df2 <- c("TW_3784.IT", "TW_3970.IT", "TW_1879.IT", "TW_0178.IT", "SF_0271.IT", "TW_3782.IT")

numbers <- which(as.numeric(substr(df2, 4, 7)) %in% df1[,1])

Next, you can use these column numbers to subset your dataframe: df[,numbers].

Upvotes: 1

Adam Sampson

Reputation: 2021

In df1 your "text" column is of integer type.

str(df1)
'data.frame':   10 obs. of  1 variable:
 $ ID: int  8765 1879 8706 1872 178 268 270 269 61 271

Convert to a string and the is_string() should return true.

b6$ID <- as.character(b6$ID)