Iris
Iris

Reputation: 1162

extract text from string in R

I have a lot of strings that all looking similar, e.g.:

x1= "Aaaa_11111_AA_Whatiwant.txt"
x2= "Bbbb_11111_BBBB_Whatiwanttoo.txt"
x3= "Ccc_22222_CC_Whatiwa.txt"

I would like to extract the: Whatiwant, Whatiwanttoo, and the Whatiwa in R.

I started with substring(x1,15,23), but I don't know how to generalize it. How can I always extract the part between the last _ and the .txt ?

Thank you!

Upvotes: 0

Views: 422

Answers (2)

You can also use the stringr library with funtions like str_extract (and many other possibilities) only in case you don't get into regular expressions. It is extremely easy to use

x1= "Aaaa_11111_AA_Whatiwant.txt"
x2= "Bbbb_11111_BBBB_Whatiwanttoo.txt"
x3= "Ccc_22222_CC_Whatiwa.txt"
library(stringr)
patron <- "(What)[a-z]+"
str_extract(x1, patron)
## [1] "Whatiwant"
str_extract(x2, patron)
## [1] "Whatiwanttoo"
str_extract(x3, patron)
## [1] "Whatiwa"

Upvotes: 0

NicE
NicE

Reputation: 21443

You can use regexp capture groups:

gsub(".*_([^_]*)\\.txt","\\1",x1)

enter image description here

Upvotes: 2

Related Questions