Chris
Chris

Reputation: 2256

Extracting values after pattern

A beginner question...

I have a list like this:

x <- c("aa=v12, bb=x21, cc=f35", "xx=r53, bb=g-25, yy=h48", "nn=u75, bb=26, gg=m98")

(but many more lines)

I need to extract what is between "bb=" and ",". I.e. I want:

x21  
g-25  
26  

Having read many similar questions here, I suppose it is stringr with str_extract I should use, but somehow I can't get it to work. Thanks for all help.

/Chris

Upvotes: 3

Views: 336

Answers (4)

IRTFM
IRTFM

Reputation: 263489

Read it in with commas as separators and take the second column:

> x.split <- read.table(textConnection(x), header=FALSE, sep=",", stringsAsFactors=FALSE)[[2]] 
[1] " bb=x21"  " bb=g-25" " bb=26"  

Then remove the "bb="

> gsub("bb=", "", x.split ) 
[1] " x21"  " g-25" " 26"  

Upvotes: 1

Chase
Chase

Reputation: 69251

Here's one solution using the base regex functions in R. First we use strsplit to split on the comma. Then we use grepl to filter only the items that start with bb= and gsub to extract all the characters after bb=.

> x <- c("aa=v12, bb=x21, cc=f35", "xx=r53, bb=g-25, yy=h48", "nn=u75, bb=26, gg=m98")
> y <- unlist(strsplit(x , ","))
> unlist(lapply(y[grepl("bb=", y)], function(x) gsub("^.*bb=(.*)", "\\1", x)))
[1] "x21"  "g-25" "26" 

It looks like str_replace is the function you are after if you want to go that route:

> str_replace(y[grepl("bb=",y)], "^.*bb=(.*)", "\\1")
[1] "x21"  "g-25" "26"

Upvotes: 1

G. Grothendieck
G. Grothendieck

Reputation: 270428

strapply in the gsubfn package can do that. Note that [^,]* matches a string of non-commas.

strapply extracts the back referenced portion (the part within parentheses):

> library(gsubfn)
> strapply(x, "bb=([^,]*)", simplify = TRUE)
[1] "x21"  "g-25" "26"  

If there are several x vectors then provide them in a list like this:

> strapply(list(x, x), "bb=([^,]*)")
[[1]]
[1] "x21"  "g-25" "26"  

[[2]]
[1] "x21"  "g-25" "26"

Upvotes: 4

Charles
Charles

Reputation: 4469

An option using regexpr:

> temp = regexpr('bb=[^,]*', x)
> substr(x, temp + 3, temp + attr(temp, 'match.length') - 1)
[1] "x21"  "g-25" "26"  

Upvotes: 2

Related Questions