Reputation: 13025
There are some strings which show the following pattern
ABC, DEF.JHI
AB,DE.(JH)
Generally, it includes three sections which are separated with ,
and .
The last character can be either normal character or sth like )
. I would like to extract the last part. For example, I would like to generate the following two strings based on the above ones
JHI
(JH)
Is there a way to do that in R?
Upvotes: 0
Views: 231
Reputation: 25854
Riffing on @josiber's answer you could remove the part of the string before the .
str1 <- c("ABC, DEF.JHI","AB,DE.(JH)")
gsub(".*\\.", "", str1)
# [1] "JHI" "(JH)"
EDIT
In case your third element is not always preceded by a .
, to extract the final part
str1 <- c("ABC, DEF.JHI","AB,DE.(JH)", "ABC.DE, (JH)")
gsub(".*[,.]", "" , str1)
# [1] "JHI" "(JH)" " (JH)"
Upvotes: 1
Reputation: 109874
Here's another possibility:
sapply(strsplit(str1, "\\.\\(|\\.|\\)"), "[[", 2)
Upvotes: 1
Reputation: 44320
You can just split on the .
using strsplit
and extract the second element.
str1 <- c("ABC, DEF.JHI","AB,DE.(JH)")
unlist(lapply(strsplit(str1, "\\."), "[", 2))
# [1] "JHI" "(JH)"
Upvotes: 1
Reputation: 887118
library(stringr)
str1 <- c("ABC, DEF.JHI","AB,DE.(JH)")
str_extract(str1,perl('(?<=\\.).*'))
#[1] "JHI" "(JH)"
(?<=\\.)
search for .
followed by .*
all characters
Upvotes: 1