Reputation: 1751
I have the following string from which I want to extract the content between the second pair of colons (in bold in the example):
"20160607181026_0000005:0607181026000000501:ES5206956802492:479"
I am using R and specifically the stringr package to manipulate strings. The command I attempted to use is:
str_extract("20160607181026_0000005:0607181026000000501:ES5206956802492:479", ":(.*):")
where the regex pattern is expressed at the end of the command. This produces the following result:
":0607181026000000501:ES5206956802492:"
I know that there is a way of grouping results and back-reference them, which would allow me to select only the part I am interested in, but I don't seem to be able to figure out the right syntax.
How can I achieve this?
Upvotes: 2
Views: 283
Reputation: 51592
Also word
from stringr
,
library(stringr)
word(v1, 3, sep=':')
#[1] "ES5206956802492"
Upvotes: 3
Reputation: 887138
If the first character after the :
starts with LETTERS
, then we can use a compact regex. Here, we use regex lookaround ((?<=:)
) and match a LETTERS ([A-Z]
) that follows the :
followed by one of more characters that are not a :
([^:]+
).
str_extract(v1, "(?<=:)[A-Z][^:]+")
#[1] "ES5206956802492"
or if it is based on the position i.e. 2nd position, a base R
option would be to match zero or more non :
([^:]*
) followed by the first :
followed by zero or more non :
followed by the second :
and then we capture the non :
in a group ((...)
) and followed by rest of the characters (.*
). In the replacement, we use the backreference, i.e. \\1
(first capture group).
sub("[^:]*:[^:]*:([^:]+).*", "\\1", v1)
#[1] "ES5206956802492"
Or the repeating part can be captured to make it compact
sub("([^:]*:){2}([^:]+).*", "\\2", v1)
#[1] "ES5206956802492"
Or with strsplit
, we split at delimiter :
and extract the 3rd element.
strsplit(v1, ":")[[1]][3]
#[1] "ES5206956802492"
v1 <- "20160607181026_0000005:0607181026000000501:ES5206956802492:479"
Upvotes: 2