Sam
Sam

Reputation: 145

R to use stringr::str_extract g

I have a string in a vector like:

l <- c("0_Mango_10a"  "0_Orange_10b"  "0_Apple_11)

I need to extract Mango_10a, Orange_10b and Apple_11

My current code is :

stringr::str_extract(l, "(?<=_)[:alnum:]+")

And i get Mango, Orange and Apple.

Could any one help me getting the desired results.

Thanks in advance!

Upvotes: 0

Views: 244

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388982

You can remove the text before the first underscore.

Using sub in base R -

l <- c("0_Mango_10a" , "0_Orange_10b",  "0_Apple_11")

sub('.*?_', '', l)
#[1] "Mango_10a"  "Orange_10b" "Apple_11" 

Or stringr::str_remove.

stringr::str_remove(l, '.*?_')

Upvotes: 0

ThomasIsCoding
ThomasIsCoding

Reputation: 101335

Here are two base R options

> gsub("^\\d+_", "", l)
[1] "Mango_10a"  "Orange_10b" "Apple_11"

> unlist(regmatches(l, gregexpr("(?<=_).*", l, perl = TRUE)))
[1] "Mango_10a"  "Orange_10b" "Apple_11"

Upvotes: 0

akrun
akrun

Reputation: 887118

Just use trimws from base R by specifying the whitespace as one or more digits(\\d+) followed by underscore (_)

trimws(l, whitespace = "\\d+_")
[1] "Mango_10a"  "Orange_10b" "Apple_11"  

With stringr, str_remove can be used

stringr::str_remove(l, "^\\d+_")
[1] "Mango_10a"  "Orange_10b" "Apple_11"  

In str_extract, the pattern specified is only to match alphanumeric and not _. If we include, it will work

stringr::str_extract(l, "(?<=_)[[:alnum:]_]+")
[1] "Mango_10a"  "Orange_10b" "Apple_11"  

Upvotes: 3

Related Questions