aprilian
aprilian

Reputation: 650

grep names using regex from data table

How do I grep columns from R data table with regex pattern.

I need to extract columns which has string "nbr[0-9]_"* like example nbr1_L or nbr6_L

   names<- c("nbr4","nbr4_L",   "nbr5"  ,"nbr6_L",  "nbr7_L"    ,"nbr4_L"   ,"nbr4_L")
    dt<- data.table(cbind("aa","bb","cc","dd","ff","gg","hh"))
    setnames(dt,names)

I tried below

dt[, .SD, .SDcols =  names(dt) %like% "nbr*_*"]

grep('^nbr\\d+\\_\\*$', names(dt), value=TRUE)

Upvotes: 2

Views: 1206

Answers (4)

krads
krads

Reputation: 1369

I think @mt1022 has an excellent, elegant solution.

But just to help the OP, @Omer, further, I'll just point out that your attempt to use .SD, .SDcols almost worked!

You certainly can use that method - your regex pattern just needs to be corrected. E.g. This would work if all you are after is a single digit number in your column names:

dt[, .SD, .SDcols =  names(dt) %like% "nbr[0-9]_"]

   nbr4_L nbr6_L nbr7_L nbr4_L nbr4_L
1:     bb     dd     ff     gg     hh

Better, in case your columns have more than single digit numbers, use:

dt[, .SD, .SDcols =  names(dt) %like% "nbr[0-9]+_"]

OR BEST: Substitue the pattern @mt1022 used in his solution which adds ^ at the beginning meaning find matches only at the start of the string. \\d in @mt1022's solution is equivalent to [0-9] above.

Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76402

If you want to grep "nbr" followed by exactly one digit folowed by an underscore folowed by exactly one character, then try this:

grep("^nbr[[:digit:]]{1}_.$", names(dt), value = TRUE)
#[1] "nbr4_L" "nbr6_L" "nbr7_L" "nbr4_L" "nbr4_L"

So, to subset the data.table it would be

i <- grep("^nbr[[:digit:]]{1}_.$", names(dt), value = TRUE)
dt[, ..i]
#   nbr4_L nbr6_L nbr7_L nbr4_L nbr4_L
#1:     bb     dd     ff     bb     bb

Note that you don't really need the argument value = TRUE:

j <- grep("^nbr[[:digit:]]{1}_.$", names(dt))
dt[, ..j]
#   nbr4_L nbr6_L nbr7_L nbr4_L nbr4_L
#1:     bb     dd     ff     gg     hh

Upvotes: 1

Nicolas2
Nicolas2

Reputation: 2210

Works better without data.table:

dt <- as.data.frame(dt)
dt[,grep("nbr[0-9]_",colnames(dt))]
#  nbr4_L nbr6_L nbr7_L nbr4_L.1 nbr4_L.2
#1     bb     dd     ff       gg       hh

Upvotes: 1

mt1022
mt1022

Reputation: 17289

Here is a way to do it with %like%:

> idx <- names(dt) %like% '^nbr\\d+_.*'
> dt[, ..idx]
   nbr4_L nbr6_L nbr7_L nbr4_L nbr4_L
1:     bb     dd     ff     gg     hh

Upvotes: 5

Related Questions