Reputation: 4100
I want to extract words which have "@" symbol in it and remove all the other words. So, my data will look like this:
Author Content
Name1 Hi,@tim how are you @Blue.
Name2 @xyz, are you ok?
Name3 it is good @my @you
where author and content are column names.
I want data in below format:
Author Content
Name1 tim
Name1 Blue
Name2 xyz
Name3 my
Name3 you
So, I only want words which have "@" symbol with it and drop everything else.
Upvotes: 0
Views: 1007
Reputation: 887038
We can use str_extract_all
from stringr
to extract the words (\\w+
) that follow @
in the 'Content' column grouped by 'Author'. Here, I used data.table
methods to call the group by operation (after converting the 'data.frame' to 'data.table' (setDT(df1)
).
library(data.table)
library(stringr)
setDT(df1)[, .(Content=unlist(str_extract_all(Content,
"(?<=@)\\w+"))), by = Author]
# Author Content
#1: Name1 tim
#2: Name1 Blue
#3: Name2 xyz
#4: Name3 my
#5: Name3 you
df1 <- structure(list(Author = c("Name1", "Name2",
"Name3"), Content = c("Hi,@tim how are you @Blue.",
"@xyz, are you ok?", "it is good @my @you")),
.Names = c("Author", "Content"), class = "data.frame",
row.names = c(NA, -3L))
Upvotes: 3