Reputation: 337
I am trying to subset a dataframe based on column names starting with a particular string. I have some columns which are like ABC_1 ABC_2 ABC_3 and some like ABC_XYZ_1, ABC_XYZ_2, ABC_XYZ_3
How can I subset my dataframe such that it contains only ABC_1, ABC_2, ABC_3 ...ABC_n columns and not the ABC_XYZ_1, ABC_XYZ_2...?
I have tried this option
set.seed(1)
df <- data.frame( ABC_1 = sample(0:1,3,repl = TRUE),
ABC_2 = sample(0:1,3,repl = TRUE),
ABC_XYZ_1 = sample(0:1,3,repl = TRUE),
ABC_XYZ_2 = sample(0:1,3,repl = TRUE) )
df1 <- df[ , grepl( "ABC" , names( df ) ) ]
ind <- apply( df1 , 1 , function(x) any( x > 0 ) )
df1[ ind , ]
but this gives me both the column names with ABC_1...ABC_n ...and ABC_XYZ_1...ABC_XYZ_n... I am not interested in ABC_XYZ_1 columns , only columns with ABC_1,.... Any suggestion is much appreciated.
Upvotes: 1
Views: 3380
Reputation: 17611
To specify "ABC_" followed by a one or more digits (i.e. \\d+
or [0-9]+
), you can use
df1 <- df[ , grepl("ABC_\\d+", names( df ), perl = TRUE ) ]
# df1 <- df[ , grepl("ABC_[0-9]+", names( df ), perl = TRUE ) ] # another option
To force the column names to start with "ABC_" you can add ^
to the regex to match only when "ABC_\d+" occurs at the start of the string as opposed to occurring anywhere within it.
df1 <- df[ , grepl("^ABC_\\d+", names( df ), perl = TRUE ) ]
If dplyr
is more to your liking, you might try
library(dplyr)
select(df, matches("^ABC_\\d+"))
Upvotes: 6
Reputation: 2986
Another straightforward solution would be using substr
:
df1 <- df[,substr(names(df),5,7) != 'XYZ']
Upvotes: 0