Henry Holm
Henry Holm

Reputation: 545

Select a string ending at the first instance of character in Regular Expressions

Say I have the following string:

pos/S881.LMG1810.QE009562.mzML

And wish to select the beginning from that string:

pos/S881.

I can use the following regex expression to get the start of the string (^), then any character (.), any number of time (*), ending with a decimal point (\.)

^.*\.

However this terminates at the last decimal in the string and thus gives me:

pos/S881.LMG1810.QE009562.

How do I terminate the selection at the first decimal point?

Upvotes: 7

Views: 717

Answers (5)

ThomasIsCoding
ThomasIsCoding

Reputation: 102920

Another regexp approach is using sub along with the pattern "(^.*?\\.).*" , e.g.,

> sub("(^.*?\\.).*", "\\1", "pos/S881.LMG1810.QE009562.mzML")
[1] "pos/S881."

Upvotes: 2

JvdV
JvdV

Reputation: 76000

Alternatively just use sub():

s <- 'pos/S881.LMG1810.QE009562.mzML'
sub("\\..*", ".", s)
# [1] "pos/S881."
  • \\..* - Match a literal dot followed by 0+ characters.

Upvotes: 7

akrun
akrun

Reputation: 887981

We can use a regex lookaround ((?<=\\.)) to match the characters that succeed after the . and remove those with trimws

trimws(str1, whitespace = "(?<=\\.).*")
[1] "pos/S881."

Or extract the characters from the start (^) of the string that are not a . ([^.]+) followed by a dot (metacharacter, thus escaped)

library(stringr)
str_extract(str1, "^[^.]+\\.")
[1] "pos/S881."

data

str1 <- "pos/S881.LMG1810.QE009562.mzML"

Upvotes: 6

TarJae
TarJae

Reputation: 79311

We could use strsplit:

With strsplit function and indexing we extract the desired part of the string:

strsplit(x, "\\.")[[1]][1]  
[1] "pos/S881"

Upvotes: 4

Henry Holm
Henry Holm

Reputation: 545

Accepting @akrun answer for their quick response but found that the "?" modifier makes "*" non greedy in my original expression as written.

stringr::str_extract("pos/S881.LMG1810.QE009562.mzML", "^.*?\\.")
[1] "pos/S881."

Upvotes: 3

Related Questions