natash
natash

Reputation: 127

Excluding variables with grep in R

I have a dataset like the following. Of course mine is a lot bigger with much more variables. I want to compute some stuff, for which I need to choose specific variables. For example I want to choose the variables T_H_01 - T_H_03, but I don't want to have T_H_G and T_H_S within. I tried doing it with grep, but I don't know how to tell the grep function to take all the "T_H" Items but exclude specific variables such as T_H_G and T_H_S.

df <- read.table(header=TRUE, text="
T_H_01 T_H_02 T_H_03 T_H_G T_H_S 
5 1 2 1 5 
3 1 3 3 4 
2 1 3 1 3  
4 2 5 5 3 
5 1 4 1 2 
")

df[,grep("T_H.",names(df))]

Thank you!

Upvotes: 0

Views: 1034

Answers (3)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521179

If you just want columns T_H_ followed by a number, then simply phrase that in your call to grep:

df[, grep("^T_H_\\d+$", names(df))]

If instead you want to phrase the search as explicitly excluding T_H_G and T_H_S, then you could use a negative lookahead for that:

df[, grep("^T_H_(?![GS]$).+$", names(df), perl=TRUE)]

Upvotes: 2

Terru_theTerror
Terru_theTerror

Reputation: 5017

You can use this approach, to filter out not useful column:

df[,grep("T_H.",names(df))[!(grep("T_H.",names(df)) %in% c(grep("T_H_G",names(df)),grep("T_H_S",names(df))))]]
  T_H_01 T_H_02 T_H_03
1      5      1      2
2      3      1      3
3      2      1      3
4      4      2      5
5      5      1      4

If you have a generic pattern to exclude specific columns, you can improve the grep condition with it.

Upvotes: 1

George
George

Reputation: 903

You could do something like this

ex <- c('T_H_G', 'T_H_S' )

df[,grepl("T_H.", names(df)) & !names(df) %in% ex]

Upvotes: 1

Related Questions