Reputation: 2256
A beginner question...
I have a list like this:
x <- c("aa=v12, bb=x21, cc=f35", "xx=r53, bb=g-25, yy=h48", "nn=u75, bb=26, gg=m98")
(but many more lines)
I need to extract what is between "bb=" and ",". I.e. I want:
x21
g-25
26
Having read many similar questions here, I suppose it is stringr with str_extract I should use, but somehow I can't get it to work. Thanks for all help.
/Chris
Upvotes: 3
Views: 336
Reputation: 263489
Read it in with commas as separators and take the second column:
> x.split <- read.table(textConnection(x), header=FALSE, sep=",", stringsAsFactors=FALSE)[[2]]
[1] " bb=x21" " bb=g-25" " bb=26"
Then remove the "bb="
> gsub("bb=", "", x.split )
[1] " x21" " g-25" " 26"
Upvotes: 1
Reputation: 69251
Here's one solution using the base regex functions in R. First we use strsplit
to split on the comma. Then we use grepl
to filter only the items that start with bb=
and gsub
to extract all the characters after bb=
.
> x <- c("aa=v12, bb=x21, cc=f35", "xx=r53, bb=g-25, yy=h48", "nn=u75, bb=26, gg=m98")
> y <- unlist(strsplit(x , ","))
> unlist(lapply(y[grepl("bb=", y)], function(x) gsub("^.*bb=(.*)", "\\1", x)))
[1] "x21" "g-25" "26"
It looks like str_replace
is the function you are after if you want to go that route:
> str_replace(y[grepl("bb=",y)], "^.*bb=(.*)", "\\1")
[1] "x21" "g-25" "26"
Upvotes: 1
Reputation: 270428
strapply
in the gsubfn package can do that. Note that [^,]*
matches a string of non-commas.
strapply
extracts the back referenced portion (the part within parentheses):
> library(gsubfn)
> strapply(x, "bb=([^,]*)", simplify = TRUE)
[1] "x21" "g-25" "26"
If there are several x
vectors then provide them in a list like this:
> strapply(list(x, x), "bb=([^,]*)")
[[1]]
[1] "x21" "g-25" "26"
[[2]]
[1] "x21" "g-25" "26"
Upvotes: 4
Reputation: 4469
An option using regexpr
:
> temp = regexpr('bb=[^,]*', x)
> substr(x, temp + 3, temp + attr(temp, 'match.length') - 1)
[1] "x21" "g-25" "26"
Upvotes: 2