Bort Edwards
Bort Edwards

Reputation: 39

Find words beginning with a certain character, remove that character and add other characters to the end, in R

I have a large data frame, one column of which "scientificName" has various scientific names and their authors. Some of these names are hybrids, which are denoted by an "×" in front (NB this is the multiplication symbol ×, NOT a standard text x). Some hybrids have the symbol in front of the first word in the name, but I am only interested in those with it in front of the second (eg "Rosa ×obtusa Ripart" What I would like to do is go through the column "species" and remove all the signs at the beginning of the second word, and append _x (plain text "x") to the end of the same word, ie.

Rosa ×obtusa Ripart -> Rosa obtusa_x Ripart

I had started with

df$scientificName[str_detect(df$scientificName, "×")]

but have tied myself in knots trying to pick only the second word, let alone removing and adding characters.

Any help gratefully received! Toy dataset here (only the third entry should be modified):

df <- data.frame(stuff=c("hybrids", "are", "annoying"), scientificName=c("×Conyzigeron huelsenii (Vatke) Rauschert","Viola wittrockiana Koppert", "Rosa ×obtusa Ripart"))

Upvotes: 0

Views: 195

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389335

Using sub you can try :

sub('^(\\w+)\\s(×)(\\w+)', '\\1 \\3_x', df$scientificName)

#[1] "×Conyzigeron huelsenii (Vatke) Rauschert"
#[2] "Viola wittrockiana Koppert"              
#[3] "Rosa obtusa_x Ripart"         `

For the strings that have × in them in the second word, this extract the characters after × and appends _x to them removing × from it.

Upvotes: 2

Related Questions