James
James

Reputation: 1457

regex - return all before the second occurrence

Given this string:

DNS000001320_309.0/121.0_t0

How would I return everything before the second occurrence of "_"?

DNS000001320_309.0/121.0

I am using R.

Thanks.

Upvotes: 21

Views: 21586

Answers (5)

LMc
LMc

Reputation: 18622

library(strex)

str_before_nth("DNS000001320_309.0/121.0_t0", "_", 2)
# [1] "DNS000001320_309.0/121.0"

Upvotes: 0

daroczig
daroczig

Reputation: 28632

I think this might do the task (regex to match everything befor the last occurence of _):

_([^_]*)$

E.g.:

> sub('_([^_]*)$', '', "DNS000001320_309.0/121.0_t0")
[1] "DNS000001320_309.0/121.0"

Upvotes: 13

Bart Kiers
Bart Kiers

Reputation: 170148

The following script:

s <- "DNS000001320_309.0/121.0_t0"
t <- gsub("^([^_]*_[^_]*)_.*$", "\\1", s)
t

will print:

DNS000001320_309.0/121.0

A quick explanation of the regex:

^         # the start of the input
(         # start group 1
  [^_]*   #   zero or more chars other than `_`
  _       #   a literal `_`
  [^_]*   #   zero or more chars other than `_`
)         # end group 1
_         # a literal `_`
.*        # consume the rest of the string
$         # the end of the input

which is replaced with:

\\1       # whatever is matched in group 1

And if there are less than 2 underscores, the string is not changed.

Upvotes: 54

darckeen
darckeen

Reputation: 960

not pretty but this will do the trick

mystr <- "DNS000001320_309.0/121.0_t0"

mytok <- paste(strsplit(mystr,"_")[[1]][1:2],collapse="_")

Upvotes: 7

joran
joran

Reputation: 173537

Personally, I hate regex, so luckily there's a way to do this without them, just by splitting the string:

> s <- "DNS000001320_309.0/121.0_t0"      
> paste(strsplit(s,"_")[[1]][1:2],collapse = "_")
[1] "DNS000001320_309.0/121.0"

Although of course this assumes that there will always be at least 2 underscores in your string, so be careful if you vectorize this and that isn't the case.

Upvotes: 12

Related Questions