Reputation: 323

How to remove variable characters in a column?

I am attempting to remove an appended "s.#" from a column in a data frame:

Species <- c("Dogs.1","Dogs.2","Dogs.3","Cats.1","Cats.2","Cats.3")
Breed <- c("Great Dane","Beagle","Beagle","Bengal","Tabby","Siamese")

names(Species) <- "Species"
names(Breed) <- "Breed"

pets <- as.data.frame(cbind(Species,Breed))

This produces the following data frame:

  Species      Breed
1  Dogs.1 Great Dane
2  Dogs.2     Beagle
3  Dogs.3     Beagle
4  Cats.1     Bengal
5  Cats.2      Tabby
6  Cats.3    Siamese

I'd like the output to look more like this:

  Species  Breed
1  Dog     Great Dane
2  Dog     Beagle
3  Dog     Beagle
4  Cat     Bengal
5  Cat     Tabby
6  Cat     Siamese

Is there a way of manipulating the Species column to take out the ".#"?

Upvotes: 0

Answers (3)

AndS.

Reputation: 8120

Here is another solution:

library(stringr)
str_extract(pets$Species, "^.*(?=s)")
[1] "Dog" "Dog" "Dog" "Cat" "Cat" "Cat"

I often find that when a dataframe is in long format and strings are formatted as something.# or something_#, that the # appended at the end can hold valuable information that can be used for grouping, faceting, stats, and/or data visualization down the road. I'm not sure if that is your case, however, but here is a way to pull the two bits of information apart to retain the appended information.

library(tidyr)
library(dplyr)
library(stringr)
new_pets <- pets %>%
    separate(col = Species, into = c("type", "owner"), sep = "\\.") %>%
    mutate(type = str_extract(type, "^.*(?=s)"))

new_pets
#   type owner      Breed
# 1  Dog     1 Great Dane
# 2  Dog     2     Beagle
# 3  Dog     3     Beagle
# 4  Cat     1     Bengal
# 5  Cat     2      Tabby
# 6  Cat     3    Siamese

Upvotes: 0

RavinderSingh13

Reputation: 133760

EDIT: To remove s too from Species column use following.

sub("s\\..*","",pets$Species)

To cover small and capital sS too use following.

sub("[Ss]\\..*","",pets$Species)

Could you please try following.

sub("\\..*","",pets$Species)

Or if Species column always have .digits then use following.

sub("\\.[0-9]+","",pets$Species)

In case you want to save the output in data frame's column itself use following then.

pets$Species <- sub("\\..*","",pets$Species)

Upvotes: 1

Tim Biegeleisen

Reputation: 522712

We can use sub here. The patten below will remove a dot followed by one or more digits, occurring as the very last thing in the Species text. I also remove an optional letter s which might (or might not) occur before the dot.

pets$Species <- sub("s?\\.\\d+$", "", pets$Species)
pets

  Species      Breed
1     Dog Great Dane
2     Dog     Beagle
3     Dog     Beagle
4     Cat     Bengal
5     Cat     Tabby
6     Cat     Siamese

Demo

Upvotes: 1

How to remove variable characters in a column?

Answers (3)

Demo

Related Questions