user1471980
user1471980

Reputation: 10626

how do you extract column names that match any of the character in R?

I have this data frame called t:

dput(t)
structure(list(Server = structure(c(2L, 3L, 4L, 5L, 1L, 1L), .Label = c("", 
"Server1", "Server2", "Server3", "Server4"), class = "factor"), 
    Date = structure(c(2L, 3L, 4L, 5L, 1L, 1L), .Label = c("", 
    "7/17/2017 15:01", "7/17/2017 15:02", "7/17/2017 15:03", 
    "7/17/2017 15:04"), class = "factor"), Host_CPU = c(1.161323547, 
    6.966178894, 0.656402588, 0.555137634, NA, NA), UsedMemPercent = c(11.33, 
    11.38, 11.38, 11.38, NA, NA), MY_REPORTING_NYAPP = c(1.05, 
    0.65, 0.52, 0.32, NA, NA)), .Names = c("Server", "Date", 
"Host_CPU", "UsedMemPercent", "MY_REPORTING_NYAPP"), class = "data.frame", row.names = c(NA, 
-6L))

I need to be able to grep the names of the columns that may include any of the string separated by under score.

For example,

app<-c("MY_NYAPP")

I need to grep if any of the words in app vector separated by "_" and assigned it to var.

app1<-unlist(strsplit(app, "_"))

var<-grep(app1,names(t), value=TRUE)

Any ideas?

Upvotes: 1

Views: 568

Answers (1)

Florian
Florian

Reputation: 25375

If I understand correctly, you want to check which columnnames contain both "MY" and "APP" if the input is "MY_APP"?

t = structure(list(Server = structure(c(2L, 3L, 4L, 5L, 1L, 1L), .Label = c("", 
                                                                        "Server1", "Server2", "Server3", "Server4"), class = "factor"), 
               Date = structure(c(2L, 3L, 4L, 5L, 1L, 1L), .Label = c("", 
                                                                      "7/17/2017 15:01", "7/17/2017 15:02", "7/17/2017 15:03", 
                                                                      "7/17/2017 15:04"), class = "factor"), Host_CPU = c(1.161323547, 
                                                                                                                          6.966178894, 0.656402588, 0.555137634, NA, NA), UsedMemPercent = c(11.33, 
                                                                                                                                                                                             11.38, 11.38, 11.38, NA, NA), MY_REPORTING_NYAPP = c(1.05, 
                                                                                                                                                                                                                                                  0.65, 0.52, 0.32, NA, NA)), .Names = c("Server", "Date", 
                                                                                                                                                                                                                                                                                         "Host_CPU", "UsedMemPercent", "MY_REPORTING_NYAPP"), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                                                                                                                                                                  -6L))

app<-c("MY_NYAPP")

app2 = unlist(strsplit(app,"_"))
colnames(t)[rowSums(sapply(app2, function(x) grepl(x,colnames(t))))==length(app2)]

which returns:

[1] "MY_REPORTING_NYAPP"

Hope this helps.

Upvotes: 1

Related Questions