R: Regular Expressions in R - Multi string extraction

Question

Say I have a string like this:

[1] "Degradation: AGL, PGM1, PGM2, PGM3, PYGL, PYGM.

"

I want to extract each of these gene IDs into a vector. I could probably use strsplit in this case, but I want to do this with regex as I will later have more complex cases. Say I want to extract all strings that contain '[A-Z0-9]{2,} (if it contains any combinations of at least two capital letters and numbers then I want it).

Thoughts?

Fojtasek · Accepted Answer

The stringr package makes this kind of thing pretty easy.

> library(stringr)
> x <- "Degradation: AGL, PGM1, PGM2, PGM3, PYGL, PYGM.

"
> str_extract_all(x, '[A-Z0-9]{2,}')
[[1]]
[1] "AGL"  "PGM1" "PGM2" "PGM3" "PYGL" "PYGM"

R: Regular Expressions in R - Multi string extraction

Answers (2)

Related Questions