Reputation: 39
I have a large data frame, one column of which "scientificName" has various scientific names and their authors. Some of these names are hybrids, which are denoted by an "×" in front (NB this is the multiplication symbol ×, NOT a standard text x). Some hybrids have the symbol in front of the first word in the name, but I am only interested in those with it in front of the second (eg "Rosa ×obtusa Ripart" What I would like to do is go through the column "species" and remove all the signs at the beginning of the second word, and append _x (plain text "x") to the end of the same word, ie.
Rosa ×obtusa Ripart -> Rosa obtusa_x Ripart
I had started with
df$scientificName[str_detect(df$scientificName, "×")]
but have tied myself in knots trying to pick only the second word, let alone removing and adding characters.
Any help gratefully received! Toy dataset here (only the third entry should be modified):
df <- data.frame(stuff=c("hybrids", "are", "annoying"), scientificName=c("×Conyzigeron huelsenii (Vatke) Rauschert","Viola wittrockiana Koppert", "Rosa ×obtusa Ripart"))
Upvotes: 0
Views: 195
Reputation: 389335
Using sub
you can try :
sub('^(\\w+)\\s(×)(\\w+)', '\\1 \\3_x', df$scientificName)
#[1] "×Conyzigeron huelsenii (Vatke) Rauschert"
#[2] "Viola wittrockiana Koppert"
#[3] "Rosa obtusa_x Ripart" `
For the strings that have ×
in them in the second word, this extract the characters after ×
and appends _x
to them removing ×
from it.
Upvotes: 2