user1471980
user1471980

Reputation: 10656

how do you create regex to subset data frame based on some search strings?

I am trying to search for strings to subset the data frame. My df looks like this:

dput(df)
structure(list(Cause = structure(c(2L, 1L), .Label = c("jasper not able to read the property table after the release", 
"More than 7000  messages loaded which stuck up"), class = "factor"), 
    Resolution = structure(1:2, .Label = c("jobs and reports are processed", 
    "Updated the property table which resolved the issue."), class = "factor")), .Names = c("Cause", 
"Resolution"), class = "data.frame", row.names = c(NA, -2L))

I am trying to do this:

df1<-subset(df, grepl("*MQ*|*queue*|*Queue*", df$Cause))

searching for MQ or queue or Queue in the Cause column, subset the data frame df with matched records. It does not seem to be working, it is catching other records that MQ, queue or Queue string are not present.

Is this how you do this, any other ideas I can follow?

Upvotes: 4

Views: 207

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 270248

Transferred from comments:

subset(df, grepl("MQ|Queue|queue", Cause))

or if any case is acceptable then:

subset(df, grepl("mq|queue", Cause, ignore.case = TRUE))

To get more information try ?regex and ?grepl from within R.

Upvotes: 1

Vincent Bonhomme
Vincent Bonhomme

Reputation: 7453

The regexp below seems to work. I have added a line to your data.frame so that it's a more interesting example.

I think the problem came from *s in your regexp, also added braces to define groups for the | but don't think it's mandatory here.

df <- data.frame(Cause=c("jasper not able to read the property table after the release", 
                         "More than 7000  messages loaded which stuck up",
                         "blabla Queue blabla"),
                 Resolution = c("jobs and reports are processed", 
                                "Updated the property table which resolved the issue.",
                                "hop"))

> head(df)
Cause                                           Resolution
1 jasper not able to read the property table after the release                       jobs and reports are processed
2               More than 7000  messages loaded which stuck up Updated the property table which resolved the issue.
3                                          blabla Queue blabla                                                  hop

> subset(df, grepl("(MQ)|(queue)|(Queue)", df$Cause))
Cause Resolution
3 blabla Queue blabla        hop

Is this what you wanted?

Upvotes: 6

Related Questions