Suzanne Vogelezang
Suzanne Vogelezang

Reputation: 23

Remove part of value in column

I have a dataframe with the following two columns:

RSID rs12345. rs3984. rs12398432. rs79372.  etc   
ALLELE A. C. T. G. etc

Now I need to remove the . after each value in the RSID- and ALLELE-column.

I tried this option:

df$RSID[df$RSID == "."] <- " "

df$ALLELE[df$ALLELE == "."] <- " "

But unfortunately it did not work. Do you have suggestions?

Upvotes: 2

Views: 152

Answers (1)

akrun
akrun

Reputation: 887173

As we want to remove the last character that is a dot (.), we can either use sub or substring. Loop over the columns of interest, match with . at the end ($) of the string and replace it with blanks ("")

df[c("RSID", "ALLELE")] <- lapply(df[c("RSID", "ALLELE")], function(x) sub("\\.$", "", x))
df
#        RSID ALLELE
#1    rs12345      A
#2     rs3984      B
#3 rs12398432      C
#4    rs79372      D

Or a faster option is with substr to keep all the characters except the last one.

df[c("RSID", "ALLELE")] <- lapply(df[c("RSID", "ALLELE")], 
                       function(x) substr(x, 1, nchar(x)-1))

data

df <- data.frame(RSID = c("rs12345.", "rs3984.", "rs12398432.", "rs79372."),
        ALLELE = paste0(LETTERS[1:4], "."), stringsAsFactors=FALSE)

Upvotes: 2

Related Questions