Vahid Safinia
Vahid Safinia

Reputation: 161

Extract specific emails from different emails in a column- R

I want to extract specific emails (@enron.com) from 'To' column in my dataframe.In some of the rows there are more than one email. For example in one row I have this : [email protected], [email protected], [email protected], [email protected], [email protected],[email protected], [email protected]. My question is how can I extract just Enron domain (@enron.com) emails from this column and save it in new column?I can extract them but the problem is it puts each email in a row that is not true because for example if a row contains 10 Enron emails out of 20 emails I want to have all that Enron emails in one row not in 10 rows.I run the code from here: How to extract expression matching an email address in a text file using R or Command Line? , emails = regmatches(df, gregexpr("([_a-z0-9-]+(\\.[_a-z0-9-]+)*@enron.com)", df))but I get this error : Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 2, 0, 5.

Upvotes: 1

Views: 295

Answers (1)

akrun
akrun

Reputation: 887118

We can use grep for this

subset(df, grepl("enron.com", To))

If there are multiple emails in a single row, use the str_extract

library(stringr)
data.frame(To =sapply(str_extract_all(df$To, "\\[email protected]"), paste, collapse=","))

Upvotes: 1

Related Questions