Reputation: 49
I have a dataframe with a column "SampleID" containing a series of sample IDs, all of which end with "_Dup". I want to remove the "_Dup" suffix from all row names in the column.
Here's an example of the df:
df
SampleID Concentration
sample1_Dup 1
sample2_Dup 2
sample3_Dup 3
The result I'm looking for is this:
df
SampleID Concentration
sample 1 1
sample 2 2
sample 3 3
I've searched for solutions to this problem using base R and the tidyverse but haven't been able to find anything on modifying all row names in a column.
Upvotes: 1
Views: 1048
Reputation: 47
I think I would just creat a new variable like this.
df$new_var<-substr(df$SampleID, 1,7)
that should take the first 7 characters of the string and put it in a new column.
Upvotes: 1
Reputation: 4824
A tidyverse-style solution:
df %>%
mutate(SampleID = SampleID %>% str_replace('(.*)([0-9])_Dup$', '\\1 \\2'))
The tidyverse-style string manipulation fuctions are from stringr
, and all begin with names like str_XYZ
. They are capable of using regular expressions. Here, we used "named groups" -- that's the part of the regular expression inside the parenthesis. The first named group (*.)
is everything that comes before any single number. A single number is the second named group (([0-9])
). We put the two parts together with a space between them in the 3rd argument to str_replace
, which is \\1 \\2
.
Upvotes: 0
Reputation: 23
You could split the column into 2, and then remove the unwanted column.
separate(data, col, into, sep = "[^[:alnum:]]+", remove = TRUE,
convert = FALSE, extra = "warn", fill = "warn", ...)
Upvotes: 0
Reputation: 520978
Try using sub
:
df$SampleID <- sub("(\\d+)_[^_]+$", " \\1", df$SampleID)
df$SampleID
[1] "sample 1" "sample 2" "sample 3"
The strategy here is to match and capture the sample number, followed by the final underscore and the rest of the sample ID. Then, we replace with just a space followed by that captured sample number.
Upvotes: 2