Reputation: 311
I have a column of strings that I would like to remove everything after the last '.' like so:
ENST00000338167.9
ABCDE.42927.6
ENST00000265393.10
ABCDE.43577.3
ENST00000370826.3
I would like to replace remove the '.' and everything after for the 'ENST' entries only eg:
ENST00000338167
ABCDE.42927.6
ENST00000265393
ABCDE.43577.3
ENST00000370826
I can do
function(x) sub("\\.[^.]*$", "", x)
if I try
function(x) sub("ENST*\\.[^.]*$", "", x)
this isn't quite working and I don't fully understand the regex commands.
Upvotes: 1
Views: 477
Reputation: 3221
We can use startsWith
and sub
combination:
Data:
df=read.table(text="ENST00000338167.9
ABCDE.42927.6
ENST00000265393.10
ABCDE.43577.3
ENST00000370826.3",header=F)
# if string starts with ENST then remove everything after . (dot) in the
# string else print the string as it is.
ifelse(startsWith(as.character(df[,1]),"ENST"),sub("*\\..*", "", df$V1),
as.character(df[,1]))
Output:
[1] "ENST00000338167" "ABCDE.42927.6" "ENST00000265393" "ABCDE.43577.3" "ENST00000370826"
Upvotes: 0
Reputation: 887951
We can use data.table
to specify the logical condition in i
while updating the j
library(data.table)
setDT(df)[grepl("^ENST", Col1), Col1 := sub("\\.[^.]+$", "", Col1)]
df
# Col1
#1: ENST00000338167
#2: ABCDE.42927.6
#3: ENST00000265393
#4: ABCDE.43577.3
#5: ENST00000370826
df <- structure(list(Col1 = c("ENST00000338167.9", "ABCDE.42927.6",
"ENST00000265393.10", "ABCDE.43577.3", "ENST00000370826.3")), row.names = c(NA,
-5L), class = "data.frame")
Upvotes: 0
Reputation: 50738
We can use a capture group inside a single gsub
call
gsub("(^ENST\\d+)\\.\\d+", "\\1", df[, 1])
#[1] "ENST00000338167" "ABCDE.42927.6" "ENST00000265393" "ABCDE.43577.3"
#[5] "ENST00000370826"
df <- read.table(text =
"ENST00000338167.9
ABCDE.42927.6
ENST00000265393.10
ABCDE.43577.3
ENST00000370826.3", header = F)
Upvotes: 2
Reputation: 389325
We can use combination of ifelse
, grepl
and sub
. We first check if the string consists of "ENST" string and if it does then remove everything after "." using sub
.
ifelse(grepl("^ENST", x), sub("\\..*", "", x), x)
#[1] "ENST00000338167" "ABCDE.42927.6" "ENST00000265393" "ABCDE.43577.3"
#[5] "ENST00000370826"
data
x <- c("ENST00000338167.9","ABCDE.42927.6","ENST00000265393.10",
"ABCDE.43577.3","ENST00000370826.3")
Upvotes: 3