Reputation: 323
I am attempting to remove an appended "s.#" from a column in a data frame:
Species <- c("Dogs.1","Dogs.2","Dogs.3","Cats.1","Cats.2","Cats.3")
Breed <- c("Great Dane","Beagle","Beagle","Bengal","Tabby","Siamese")
names(Species) <- "Species"
names(Breed) <- "Breed"
pets <- as.data.frame(cbind(Species,Breed))
This produces the following data frame:
Species Breed
1 Dogs.1 Great Dane
2 Dogs.2 Beagle
3 Dogs.3 Beagle
4 Cats.1 Bengal
5 Cats.2 Tabby
6 Cats.3 Siamese
I'd like the output to look more like this:
Species Breed
1 Dog Great Dane
2 Dog Beagle
3 Dog Beagle
4 Cat Bengal
5 Cat Tabby
6 Cat Siamese
Is there a way of manipulating the Species column to take out the ".#"?
Upvotes: 0
Views: 94
Reputation: 8120
Here is another solution:
library(stringr)
str_extract(pets$Species, "^.*(?=s)")
[1] "Dog" "Dog" "Dog" "Cat" "Cat" "Cat"
I often find that when a dataframe is in long format and strings are formatted as something.# or something_#, that the # appended at the end can hold valuable information that can be used for grouping, faceting, stats, and/or data visualization down the road. I'm not sure if that is your case, however, but here is a way to pull the two bits of information apart to retain the appended information.
library(tidyr)
library(dplyr)
library(stringr)
new_pets <- pets %>%
separate(col = Species, into = c("type", "owner"), sep = "\\.") %>%
mutate(type = str_extract(type, "^.*(?=s)"))
new_pets
# type owner Breed
# 1 Dog 1 Great Dane
# 2 Dog 2 Beagle
# 3 Dog 3 Beagle
# 4 Cat 1 Bengal
# 5 Cat 2 Tabby
# 6 Cat 3 Siamese
Upvotes: 0
Reputation: 133760
EDIT: To remove s
too from Species column use following.
sub("s\\..*","",pets$Species)
To cover small and capital sS
too use following.
sub("[Ss]\\..*","",pets$Species)
Could you please try following.
sub("\\..*","",pets$Species)
Or if Species column always have .digits
then use following.
sub("\\.[0-9]+","",pets$Species)
In case you want to save the output in data frame's column itself use following then.
pets$Species <- sub("\\..*","",pets$Species)
Upvotes: 1
Reputation: 522712
We can use sub
here. The patten below will remove a dot followed by one or more digits, occurring as the very last thing in the Species
text. I also remove an optional letter s
which might (or might not) occur before the dot.
pets$Species <- sub("s?\\.\\d+$", "", pets$Species)
pets
Species Breed
1 Dog Great Dane
2 Dog Beagle
3 Dog Beagle
4 Cat Bengal
5 Cat Tabby
6 Cat Siamese
Upvotes: 1