Reputation: 23
I have a dataframe with the following two columns:
RSID rs12345. rs3984. rs12398432. rs79372. etc
ALLELE A. C. T. G. etc
Now I need to remove the . after each value in the RSID- and ALLELE-column.
I tried this option:
df$RSID[df$RSID == "."] <- " "
df$ALLELE[df$ALLELE == "."] <- " "
But unfortunately it did not work. Do you have suggestions?
Upvotes: 2
Views: 152
Reputation: 887173
As we want to remove the last character that is a dot (.
), we can either use sub
or substring
. Loop over the columns of interest, match with .
at the end ($
) of the string and replace it with blanks (""
)
df[c("RSID", "ALLELE")] <- lapply(df[c("RSID", "ALLELE")], function(x) sub("\\.$", "", x))
df
# RSID ALLELE
#1 rs12345 A
#2 rs3984 B
#3 rs12398432 C
#4 rs79372 D
Or a faster option is with substr
to keep all the characters except the last one.
df[c("RSID", "ALLELE")] <- lapply(df[c("RSID", "ALLELE")],
function(x) substr(x, 1, nchar(x)-1))
df <- data.frame(RSID = c("rs12345.", "rs3984.", "rs12398432.", "rs79372."),
ALLELE = paste0(LETTERS[1:4], "."), stringsAsFactors=FALSE)
Upvotes: 2