Reputation: 161
I want to extract specific emails (@enron.com) from 'To' column in my dataframe.In some of the rows there are more than one email. For example in one row I have this : [email protected], [email protected], [email protected], [email protected], [email protected],[email protected], [email protected]
. My question is how can I extract just Enron domain (@enron.com) emails from this column and save it in new column?I can extract them but the problem is it puts each email in a row that is not true because for example if a row contains 10 Enron emails out of 20 emails I want to have all that Enron emails in one row not in 10 rows.I run the code from here: How to extract expression matching an email address in a text file using R or Command Line? , emails = regmatches(df, gregexpr("([_a-z0-9-]+(\\.[_a-z0-9-]+)*@enron.com)", df))
but I get this error : Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 1, 2, 0, 5
.
Upvotes: 1
Views: 295
Reputation: 887118
We can use grep
for this
subset(df, grepl("enron.com", To))
If there are multiple emails in a single row, use the str_extract
library(stringr)
data.frame(To =sapply(str_extract_all(df$To, "\\[email protected]"), paste, collapse=","))
Upvotes: 1