Reputation: 193
I have a question similar to the one asked here: r Remove parts of column name after certain characters however I have a slight wrinkle. My column titles have formats sich as ENSG00000124564.16 and ENSG00000257509.1, however I want to remove all characters after the .
I cannot just remove the last x characters, as the column titles vary in the number of characters after the .
symbol
If I follow the sub()
command in the previous question, like here: sub(".*", "", colnames(dataset[6:ncol(dataset)]))
, it does nothing. I assume because in the normal command the .
symbol is used to seperate the string you are searching for and the *
symbol to represent anything after it.
How do I alter the code to use .
as the string search symbol? This is probably a very simple question.
Upvotes: 1
Views: 848
Reputation: 2678
You can escape period like this \\.
:
x <- "ENSG00000124564.16"
sub("\\..*", "", x)
#[1] "ENSG00000124564"
## if you have list of strings it works
x <- c("ENSG00000124564.16", "ENSG00000257509.1")
sub("\\..*", "", x)
# [1] "ENSG00000124564" "ENSG00000257509"
## if you want to try it to change the column names it works
df <- data.frame(ENSG00000124564.16 = c(1, 2, 3), ENSG00000257509.1 = c(1, 1, 1))
names(df) <- sub("\\..*", "", names(df))
# ENSG00000124564 ENSG00000257509
#1 1 1
#2 2 1
#3 3 1
Upvotes: 5
Reputation: 804
with \\.
you indicate a dot. With .
you indicate any kind of character. With .*
you indicate any kind of character any number of times. With $
you indicate that it is the end of the string. So you can put those together as such:
df <- data.frame(ENSG00000124564.16=c(1,2,3), ENSG00000257509.1=c(4,5,6))
df
colnames(df) <- gsub("\\..*$", "", colnames(df))
df
edit: sm925 was too fast for my slow typing :)
Upvotes: 3