Reputation: 650
How do I grep columns from R data table with regex pattern.
I need to extract columns which has string "nbr[0-9]_"* like example nbr1_L or nbr6_L
names<- c("nbr4","nbr4_L", "nbr5" ,"nbr6_L", "nbr7_L" ,"nbr4_L" ,"nbr4_L")
dt<- data.table(cbind("aa","bb","cc","dd","ff","gg","hh"))
setnames(dt,names)
I tried below
dt[, .SD, .SDcols = names(dt) %like% "nbr*_*"]
grep('^nbr\\d+\\_\\*$', names(dt), value=TRUE)
Upvotes: 2
Views: 1206
Reputation: 1369
I think @mt1022 has an excellent, elegant solution.
But just to help the OP, @Omer, further, I'll just point out that your attempt to use .SD, .SDcols
almost worked!
You certainly can use that method - your regex pattern just needs to be corrected. E.g. This would work if all you are after is a single digit number in your column names:
dt[, .SD, .SDcols = names(dt) %like% "nbr[0-9]_"]
nbr4_L nbr6_L nbr7_L nbr4_L nbr4_L
1: bb dd ff gg hh
Better, in case your columns have more than single digit numbers, use:
dt[, .SD, .SDcols = names(dt) %like% "nbr[0-9]+_"]
OR BEST: Substitue the pattern @mt1022 used in his solution which adds ^
at the beginning meaning find matches only at the start of the string. \\d
in @mt1022's solution is equivalent to [0-9] above.
Upvotes: 1
Reputation: 76402
If you want to grep "nbr"
followed by exactly one digit folowed by an underscore folowed by exactly one character, then try this:
grep("^nbr[[:digit:]]{1}_.$", names(dt), value = TRUE)
#[1] "nbr4_L" "nbr6_L" "nbr7_L" "nbr4_L" "nbr4_L"
So, to subset the data.table it would be
i <- grep("^nbr[[:digit:]]{1}_.$", names(dt), value = TRUE)
dt[, ..i]
# nbr4_L nbr6_L nbr7_L nbr4_L nbr4_L
#1: bb dd ff bb bb
Note that you don't really need the argument value = TRUE
:
j <- grep("^nbr[[:digit:]]{1}_.$", names(dt))
dt[, ..j]
# nbr4_L nbr6_L nbr7_L nbr4_L nbr4_L
#1: bb dd ff gg hh
Upvotes: 1
Reputation: 2210
Works better without data.table:
dt <- as.data.frame(dt)
dt[,grep("nbr[0-9]_",colnames(dt))]
# nbr4_L nbr6_L nbr7_L nbr4_L.1 nbr4_L.2
#1 bb dd ff gg hh
Upvotes: 1
Reputation: 17289
Here is a way to do it with %like%
:
> idx <- names(dt) %like% '^nbr\\d+_.*'
> dt[, ..idx]
nbr4_L nbr6_L nbr7_L nbr4_L nbr4_L
1: bb dd ff gg hh
Upvotes: 5