Reputation: 127
I have a dataset like the following. Of course mine is a lot bigger with much more variables. I want to compute some stuff, for which I need to choose specific variables. For example I want to choose the variables T_H_01 - T_H_03, but I don't want to have T_H_G and T_H_S within. I tried doing it with grep, but I don't know how to tell the grep function to take all the "T_H" Items but exclude specific variables such as T_H_G and T_H_S.
df <- read.table(header=TRUE, text="
T_H_01 T_H_02 T_H_03 T_H_G T_H_S
5 1 2 1 5
3 1 3 3 4
2 1 3 1 3
4 2 5 5 3
5 1 4 1 2
")
df[,grep("T_H.",names(df))]
Thank you!
Upvotes: 0
Views: 1034
Reputation: 521179
If you just want columns T_H_
followed by a number, then simply phrase that in your call to grep
:
df[, grep("^T_H_\\d+$", names(df))]
If instead you want to phrase the search as explicitly excluding T_H_G
and T_H_S
, then you could use a negative lookahead for that:
df[, grep("^T_H_(?![GS]$).+$", names(df), perl=TRUE)]
Upvotes: 2
Reputation: 5017
You can use this approach, to filter out not useful column:
df[,grep("T_H.",names(df))[!(grep("T_H.",names(df)) %in% c(grep("T_H_G",names(df)),grep("T_H_S",names(df))))]]
T_H_01 T_H_02 T_H_03
1 5 1 2
2 3 1 3
3 2 1 3
4 4 2 5
5 5 1 4
If you have a generic pattern to exclude specific columns, you can improve the grep condition with it.
Upvotes: 1
Reputation: 903
You could do something like this
ex <- c('T_H_G', 'T_H_S' )
df[,grepl("T_H.", names(df)) & !names(df) %in% ex]
Upvotes: 1