amisos55
amisos55

Reputation: 1979

Special characters in function in R

I am going to use a "used" code in a function. Here is the code

str_extract("temp", pattern = "(?:[^-]*\\-){1}([^\\-]*)$")))

Here is the "temp" variable value:

WV-Online-Reading-S1-COMBINED-ELA-3

Here is the extracted output after running this function:

ELA-3

Can someone please explain to me how those special characters in "pattern = .." works?

Using the same function, I would like to convert this:

AIR-GEN-SUM-UD-ELA-NH-COMBINED-3-SEG1

to this:

ELA-3

A good reference to those special characters would also be useful.

Thanks!

Upvotes: 0

Views: 398

Answers (1)

eastclintw00d
eastclintw00d

Reputation: 2364

In order to find the correct regular expression you need to know what exactly you are systematically looking for in your strings. From your post I assume that you want to extract the ELA_ string and the number at the end of the strings. You could do it like this:

strings <- c("WV-Online-Reading-S1-COMBINED-ELA-3", "AIR-GEN-SUM-UD-ELA-NH-COMBINED-3-SEG1")

gsub(".*(ELA\\-).*(\\d$)", "\\1\\2", strings)

[1] "ELA-3" "ELA-1"

I will briefly explain the components of the pattern:

  • .* matches zero or more arbitraty characters
  • ELA\\- matches 'ELA-'
  • \\d$ matches a digit at the end of the line

The brackets form a capture group which can be "backreferenced" to by \\1 (first capture group) and \\2 (second capture group). gsub() takes the entire strings and replaces it by what it could match in both backreferences. As I do not know the exact systematic of what you are looking for the pattern might still need adjustments to fit your needs.

If you are interested in the first digit only you can get it with

library(stringr)
strings %>% str_extract("\\d")

Upvotes: 2

Related Questions