RanonKahn
RanonKahn

Reputation: 862

Extracting substring using R

I want to extract substring (description details) from the following strings:

string1 <- @{self=https://somesite.atlassian.net/rest/api/2/status/1; description=The issue is open and ready for the assignee to start work on it.; iconUrl=https://somesite.atlassian.net/images/icons/statuses/open.png; name=Open; id=1; statusCategory=}
string2 <- @{self=https://somesite.atlassian.net/rest/api/2/status/10203; description=; iconUrl=https://somesite.atlassian.net/images/icons/statuses/generic.png; name=Full Curation; id=10203; statusCategory=}

I am trying to get the following

ExtractedSubString1 = "The issue is open and ready for the assignee to start work on it."
ExtractedSubString2 = ""

I tried this:

library(stringr)    
ExtractedSubString1 <- substr(string1, str_locate(string1, "description=")+12, str_locate(string1, "; iconUrl")-1)
ExtractedSubString2 <- substr(string2, str_locate(string2, "description=")+12, str_locate(string2, "; iconUrl")-1)

Looking for a better way to accomplish this.

Upvotes: 0

Views: 204

Answers (2)

mattbawn
mattbawn

Reputation: 1378

You could try:

test.1 <- gsub("description=", "", strsplit(string1, "; ")[[1]][2])

test.2 <- gsub("description=", "", strsplit(string2, "; ")[[1]][2])

This simply splits the string on ; which divides each string in to 6 elements the square brackets select the 2nd element and the gsub replaces the description= to nothing to remove it.

Upvotes: 1

lmo
lmo

Reputation: 38500

Using only base R's sub and back referencing, you could do

sub(".*description=(.*?);.*", "\\1", c(string1, string2))
[1] "The issue is open and ready for the assignee to start work on it." ""

The ".*" match any set of characters, "description=" is a literal match, ".*?" matches any set of characters, but the ? forces a lazy match rather than a greedy match. ";" is a literal, and the "()" capture the sub-expression that is lazily matched. The back reference "\\1" returns the sub-expression captured in the parentheses.

Using the base R functions regexec and regmatchesgets a bit closer to the method in the OP. sapply with "[" is then used to extract the desired result.

sapply(regmatches(c(string1, string2),
                  regexec(".*description=(.*?);.*", c(string1, string2))),
       "[", 2)
[1] "The issue is open and ready for the assignee to start work on it." ""

Upvotes: 2

Related Questions