Reputation: 1075
I have a dataframe mydf:
Content term
1 Search Term: abc| NA
2 Search Term-xyz NA
3 Search Term-pqr| NA
Made a regex:
\Search Term[:]?.?([a-zA-Z]+)\
to get terms like abc xyz and pqr.
How do I extract these terms in the term column. I tried str_match and gsub, but not getting the correct results.
Upvotes: 2
Views: 61
Reputation: 51582
Just to demonstrate the word
function of stringr
,
library(stringr)
df$term <- gsub('.*-', '', word(df$Content, -1))
gsub('[[:punct:]]', '', df$term)
#[1] "abc" "xyz" "pqr"
Upvotes: 1
Reputation: 141
'gsub' will help you
content <- c("Search Term: abc|", "Search Term-xyz", "Search Term-pqr|")
term <- c(NA, NA, NA)
test123 <- as.data.frame(cbind(content, term))
test123$term <- as.character(gsub(".*(\\s+|-)|[^a-z]+$", "", test123$content))
test123
content term
1 Search Term: abc| abc
2 Search Term-xyz xyz
3 Search Term-pqr| pqr
Upvotes: 0
Reputation: 887048
We can try with sub
sub(".*(\\s+|-)", "", df1$Content)
#[1] "abc" "xyz" "pqr"
Or
library(stringr)
str_extract(df1$Content, "\\w+$")
#[1] "abc" "xyz" "pqr"
If the |
is also found in the string at the end
gsub(".*(\\s+|-)|[^a-z]+$", "", df1$Content)
#[1] "abc" "xyz" "pqr"
Or
str_extract(df1$Content, "\\w+(?=(|[|])$)")
#[1] "abc" "xyz" "pqr"
Upvotes: 2