Reputation: 49

Is there an R function for removing parts of row names?

I have a dataframe with a column "SampleID" containing a series of sample IDs, all of which end with "_Dup". I want to remove the "_Dup" suffix from all row names in the column.

Here's an example of the df:

df
SampleID      Concentration
sample1_Dup   1
sample2_Dup   2
sample3_Dup   3

The result I'm looking for is this:

df
SampleID      Concentration
sample 1      1
sample 2      2
sample 3      3

I've searched for solutions to this problem using base R and the tidyverse but haven't been able to find anything on modifying all row names in a column.

Upvotes: 1

Answers (4)

CarLinneo

Reputation: 47

I think I would just creat a new variable like this.

df$new_var<-substr(df$SampleID, 1,7)

that should take the first 7 characters of the string and put it in a new column.

Upvotes: 1

Curt F.

Reputation: 4824

A tidyverse-style solution:

df %>%
    mutate(SampleID = SampleID %>% str_replace('(.*)([0-9])_Dup$', '\\1 \\2'))

The tidyverse-style string manipulation fuctions are from stringr, and all begin with names like str_XYZ. They are capable of using regular expressions. Here, we used "named groups" -- that's the part of the regular expression inside the parenthesis. The first named group (*.) is everything that comes before any single number. A single number is the second named group (([0-9])). We put the two parts together with a space between them in the 3rd argument to str_replace, which is \\1 \\2.

Upvotes: 0

Dasax121

Reputation: 23

You could split the column into 2, and then remove the unwanted column.


separate(data, col, into, sep = "[^[:alnum:]]+", remove = TRUE,
  convert = FALSE, extra = "warn", fill = "warn", ...)

Upvotes: 0

Tim Biegeleisen

Reputation: 520978

Try using sub:

df$SampleID <- sub("(\\d+)_[^_]+$", " \\1", df$SampleID)
df$SampleID

[1] "sample 1" "sample 2" "sample 3"

The strategy here is to match and capture the sample number, followed by the final underscore and the rest of the sample ID. Then, we replace with just a space followed by that captured sample number.

Upvotes: 2

Is there an R function for removing parts of row names?

Answers (4)

Related Questions