Use grep() to select character strings with "XXX-0000" syntax

Question

Given a character vector:

    id.data = c("XXX-2355",
                "XYz-03",
                "XYU-3", 
                "ABC-1234",
                "AX_2356",
                "AbC234")

What is the appropriate way to grep for ONLY the entries that DONT'T follow an "XXX-0000" pattern? In the example above I'd want to end up with only "XXX-2355" and "ABC-1234". There are tens of thousands of records.

I tried selecting by individual issue. For example,

    id.error = rep(NA, length(id.data))
    id.error[-grep("-", id.data)] = "hyphen"

This was obviously really inefficient and I have no way of knowing every possible error. Strplit was useful to a point, but only when I know where to split.

Thanks!

devnull · Accepted Answer

You seem to be looking for invert:

invert logical. If TRUE return indices or values for elements that do not match.

> id.data = c("XXX-2355",
+                 "XYz-03",
+                 "XYU-3",
+                 "ABC-1234",
+                 "AX_2356",
+                 "AbC234")
> grep("[A-Z]{3}-[0-9]{4}", id.data)
[1] 1 4
> grep("[A-Z]{3}-[0-9]{4}", id.data, value = TRUE)
[1] "XXX-2355" "ABC-1234"
> grep("[A-Z]{3}-[0-9]{4}", id.data, invert = TRUE)
[1] 2 3 5 6
> grep("[A-Z]{3}-[0-9]{4}", id.data, invert = TRUE, value = TRUE)
[1] "XYz-03"  "XYU-3"   "AX_2356" "AbC234"
>

_{Not sure whether you want strings that match the said pattern, or those that don't match. The above example lists both options.}

Use grep() to select character strings with "XXX-0000" syntax

Answers (2)

Related Questions

Use grep() to select character strings with &quot;XXX-0000&quot; syntax

Answers (2)

Related Questions

Use grep() to select character strings with "XXX-0000" syntax