user1471980
user1471980

Reputation: 10626

extract text between certain characters in R

I need to capture TEST_WF1_CORP[-application-com.ibm.ws.runtime.WsServer] from the following string, basically from - to @ sign.

i<-c("Current CPU load - TEST_WF1_CORP[-application-com.ibm.ws.runtime.WsServer]@example1.com")

I've tried this:

str_match(i, ".*-([^\\.]*)\\@.*")[,2]

I am getting NA, any ideas?

Upvotes: 2

Views: 1750

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269634

1) gsub Replace everything up to and including -, i.e. .* -, and everything after and including @, i.e. @.*, with a zero length string. No packages are needed:

gsub(".* - |@.*", "", i)
## "TEST_WF1_CORP[-application-com.ibm.ws.runtime.WsServer]"

2) sub This would also work. It matches everything to space, minus, space (i.e. .* -) and then captures everything until @ (i.e. (.*)@ ) followed by whatever is left (.*) and replaces that with the capture group, i.e. the part within parens. It also uses no packages.

sub(".*- (.*)@.*", "\\1", i)
## [1] "TEST_WF1_CORP[-application-com.ibm.ws.runtime.WsServer]"

Note: We used this as input i:

i <- "Current CPU load - TEST_WF1_CORP[-application-com.ibm.ws.runtime.WsServer]@example1.com"

Upvotes: 5

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

You may use

-\s*([^@]+)

See the regex demo

Details:

  • - - a hyphen
  • \s* - zero or more whitespaces
  • ([^@]+) - Group 1 capturing 1 or more chars other than @.

R demo:

> library(stringr)
> i<-c("Current CPU load - TEST_WF1_CORP[-application-com.ibm.ws.runtime.WsServer]@example1.com")
> str_match(i, "-\\s*([^@]+)")[,2]
[1] "TEST_WF1_CORP[-application-com.ibm.ws.runtime.WsServer]"

The same pattern can be used with base R regmatches/regexec:

> regmatches(i, regexec("-\\s*([^@]+)", i))[[1]][2]
[1] "TEST_WF1_CORP[-application-com.ibm.ws.runtime.WsServer]"

If you prefer a replacing approach you may use a sub:

> sub(".*?-\\s*([^@]+).*", "\\1", i)
[1] "TEST_WF1_CORP[-application-com.ibm.ws.runtime.WsServer]"

Here, .*? matches any 0+ chars, as few as possible, up to the first -, then -, 0+ whitespaces (\\s*), then 1+ chars other than @ are captured into Group 1 (see ([^@]+)) and then .* matches the rest of the string. The \1 in the replacement pattern puts the contents of Group 1 back into the replacement result.

Upvotes: 2

d.b
d.b

Reputation: 32548

The following should work:

extract <- unlist(strsplit(i,"- |@"))[2]

Upvotes: 2

Related Questions