Tarak
Tarak

Reputation: 1075

Data frame column vector manipulation

I have a dataframe mydf:

                Content    term
    1 Search Term: abc|    NA
    2 Search Term-xyz      NA
    3 Search Term-pqr|     NA

Made a regex:

\Search Term[:]?.?([a-zA-Z]+)\ 

to get terms like abc xyz and pqr.

How do I extract these terms in the term column. I tried str_match and gsub, but not getting the correct results.

Upvotes: 2

Views: 61

Answers (3)

Sotos
Sotos

Reputation: 51582

Just to demonstrate the word function of stringr,

library(stringr) 
df$term <- gsub('.*-', '', word(df$Content, -1))
gsub('[[:punct:]]', '', df$term)
#[1] "abc" "xyz" "pqr"

Upvotes: 1

pmavuluri
pmavuluri

Reputation: 141

'gsub' will help you

content <- c("Search Term: abc|", "Search Term-xyz", "Search Term-pqr|")
term <- c(NA, NA, NA)
test123 <- as.data.frame(cbind(content, term))
test123$term <- as.character(gsub(".*(\\s+|-)|[^a-z]+$", "", test123$content))
test123
            content term
1 Search Term: abc|  abc
2   Search Term-xyz  xyz
3  Search Term-pqr|  pqr

Upvotes: 0

akrun
akrun

Reputation: 887048

We can try with sub

sub(".*(\\s+|-)", "", df1$Content)
#[1] "abc" "xyz" "pqr"

Or

library(stringr)
str_extract(df1$Content, "\\w+$")
#[1] "abc" "xyz" "pqr"

Update

If the | is also found in the string at the end

gsub(".*(\\s+|-)|[^a-z]+$", "", df1$Content)
#[1] "abc" "xyz" "pqr"

Or

 str_extract(df1$Content, "\\w+(?=(|[|])$)")
 #[1] "abc" "xyz" "pqr"

Upvotes: 2

Related Questions