R: Removing substring by occurrence of character

I have a vector, species_name, in dataframe genexp_2016 which contains the common and scientific names, as well as the location of several different species. For example, species_name strings may be written as

head(genexp_2016)
                                                       
  rank                                               species_name status
1 1396               Addax (Addax nasomaculatus) - Wherever found      E
2 1313            Babirusa (Babyrousa babyrussa) - Wherever found      E
3 1396     Baboon, gelada (Theropithecus gelada) - Wherever found      T
4  229 Bat, Florida bonneted (Eumops floridanus) - Wherever found      E
5  109             Bat, gray (Myotis grisescens) - Wherever found      E

What I'm attempting to do, however, is find a way to remove the end of each string in 'species_name` such that I am left with only the common name and the scientific name, and remove the location ('Wherever found').

I have thought about trying to tell R to delete everything after the first occurrence of the - character, but this is an imperfect method since some species in the dataframe have a heifen in their name, such as the black-footed ferret.

The most effective solution I've thought of is this: Telling R to read strings starting from the end instead of the beginning, and upon finding the first occurrence of -, delete everything between that character's position in the string and the end of the string. It seems like this is something I should be able to do in R, but my skills are currently not so advanced to know how to do this. Does anyone have any ideas of how I might code this, or perhaps a more efficient way for me to remove the location description in each string?

Thanks, and I appreciate any help you can offer.

Upvotes: 0

Views: 170

Answers (1)

s_baldur
s_baldur

Reputation: 33498

Too keep everything until the last - (they keyword here is greedy) you could do:

x <- 'Addax (Addax nasomaculatus) - Wherever found'
sub('(.+)-.+', '\\1', x)
# [1] "Addax (Addax nasomaculatus) "

Upvotes: 0

Related Questions