tjebo
tjebo

Reputation: 23747

Remove characters after last occurrence of delimiter - but keep characters when delimiter occurs once at the beginning

Sorry for the awkward title - very open for suggestions how to better phrase it...

This is very similar to Question 1, Question 2 and question 3. All those questions have a solution that would remove after "every last" occurrence of the delimiter (most often the underscore), including when it occurs at the beginning of the string.

I need to keep those strings where the delimiter occurs only once, at the beginning of the string.

In the example, for x[3] and x[5], I'd like to keep "-3" and "-5". My first attempt keeps -5, but not -3...

x <- c("1 - 2","2-1", "-3", "4", "-5-6")

gsub("(.*)\\-.*$", "\\1", x)
#> [1] "1 " "2"  ""   "4"  "-5"

gsub("\\-[^\\-].*$", "", x)
#> [1] "1 " "2"  ""   "4"  ""

edit Ronaks current solution works for the previous example, but fails when there are other characters than "numbers", either before or after the delimiter.

x <- c("1 - 2","2-1", "-3", "4", "-5-6", "-0.6", "20/200", "20/200-3")

stringr::str_match(x, '(-?\\d+)-?')[, 2]
#> [1] "1"  "2"  "-3" "4"  "-5" "-0" "20" "20"

desired output

#> [1] "1"  "2"  "-3" "4"  "-5" "-0.6" "20/200" "20/200"

(For the curious: this is for conversion of notations of visual acuity data, which tells us how well we can discriminate letters on a chart. This data can be sometimes very messy, but follows generally a certain pattern of notation.)

Upvotes: 1

Views: 302

Answers (2)

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

This seems to get what you want:

str_extract(x, "(-)?\\d+[.\\d/]*(?=-?)")
[1] "1"      "2"      "-3"     "4"      "-5"     "-0.6"   "20/200" "20/200"

This matches an optional - followed by a number of one or more digits followed by either . or a number or / zero or more times (*) to the left of ((?= ...)) an optional -

EDIT:

A base Rsolution is this:

unlist(regmatches(x, gregexpr("^(-)?\\d+[.\\d/]*(?=-?)", x, perl = T)))
[1] "1"      "2"      "-3"     "4"      "-5"     "-0.6"   "20/200" "20/200"

Data:

x <- c("1 - 2","2-1", "-3", "4", "-5-6", "-0.6", "20/200", "20/200-3")

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388982

Using str_match :

stringr::str_match(x, '(-?\\d+)-?')[, 2]
#[1] "1"  "2"  "-3" "4"  "-5"

This captures an optional "-" followed by a number which is followed by another optional "-".


Using str_extract :

stringr::str_extract(x, '-?\\d+(?=-?)')

and in base R :

sub("(-?\\d+)-?.*", "\\1", x)

Upvotes: 1

Related Questions