Phil D
Phil D

Reputation: 193

Removing characters in column titles after "."

I have a question similar to the one asked here: r Remove parts of column name after certain characters however I have a slight wrinkle. My column titles have formats sich as ENSG00000124564.16 and ENSG00000257509.1, however I want to remove all characters after the .

I cannot just remove the last x characters, as the column titles vary in the number of characters after the . symbol

If I follow the sub() command in the previous question, like here: sub(".*", "", colnames(dataset[6:ncol(dataset)])), it does nothing. I assume because in the normal command the . symbol is used to seperate the string you are searching for and the * symbol to represent anything after it.

How do I alter the code to use . as the string search symbol? This is probably a very simple question.

Upvotes: 1

Views: 848

Answers (2)

sm925
sm925

Reputation: 2678

You can escape period like this \\.:

x <- "ENSG00000124564.16"
sub("\\..*", "", x)
#[1] "ENSG00000124564"

update:

## if you have list of strings it works
x <- c("ENSG00000124564.16",  "ENSG00000257509.1")
sub("\\..*", "", x)
# [1] "ENSG00000124564" "ENSG00000257509"

## if you want to try it to change the column names it works
df <- data.frame(ENSG00000124564.16 = c(1, 2, 3), ENSG00000257509.1 = c(1, 1, 1))
names(df) <- sub("\\..*", "", names(df))
#  ENSG00000124564 ENSG00000257509
#1               1               1
#2               2               1
#3               3               1

Upvotes: 5

NicolasH2
NicolasH2

Reputation: 804

with \\. you indicate a dot. With . you indicate any kind of character. With .* you indicate any kind of character any number of times. With $ you indicate that it is the end of the string. So you can put those together as such:

df <- data.frame(ENSG00000124564.16=c(1,2,3), ENSG00000257509.1=c(4,5,6))
df

colnames(df) <- gsub("\\..*$", "", colnames(df))
df

edit: sm925 was too fast for my slow typing :)

Upvotes: 3

Related Questions