How to remove a pattern from rows in a dataframe in R?

Question

My data has rows that contain institutes with email addresses usually at the end. I want to remove only the email ads and keep the institutes (e.g. remove hello@canada).

df <- data.frame(institute = c(
"Air Quality Processes Research Section, Environment and Climate Change Canada, Toronto, Ontario, M3H 5T4, Canada",
"Air Quality Processes Research Section, Environment and Climate Change Canada, Toronto, Ontario, M3H 5T4, Canada. Electronic address: hello@canada",
"Aix-Marseille Universit.., Inserm, TAGC UMR S1090, 13288 Marseille, France. name@inserm",
"Applied Biological Sciences Program, Chulabhorn Graduate Institute, Bangkok, Thailand Laboratory of Biochemistry, Chulabhorn Research Institute, Bangkok, Thailand",
"Applied Biological Sciences Program, Chulabhorn Graduate Institute, Bangkok, Thailand Laboratory of Biochemistry, Chulabhorn Research Institute, Bangkok, Thailand emailX@yahoo.com"))

My goal is to be able to count the same institutes as one, since in the format above, the email addresses make the rows distinct.

I tried the code below for the first institute, but it didn't remove the complete email address.

a <- "Air Quality Processes Research Section, Environment and Climate Change Canada, Toronto, Ontario, M3H 5T4, Canada. Electronic address: hello@canada"
gsub("[^.*?]@.*", "\1", a)
# [1] "Air Quality Processes Research Section, Environment and Climate Change Canada, Toronto, Ontario, M3H 5T4, Canada. Electronic address: hell"

How to remove a pattern from rows in a dataframe in R?

Answers (1)

Related Questions